Search Results: "merge"

21 November 2023

Mike Hommey: How I (kind of) killed Mercurial at Mozilla

Did you hear the news? Firefox development is moving from Mercurial to Git. While the decision is far from being mine, and I was barely involved in the small incremental changes that ultimately led to this decision, I feel I have to take at least some responsibility. And if you are one of those who would rather use Mercurial than Git, you may direct all your ire at me. But let's take a step back and review the past 25 years leading to this decision. You'll forgive me for skipping some details and any possible inaccuracies. This is already a long post, while I could have been more thorough, even I think that would have been too much. This is also not an official Mozilla position, only my personal perception and recollection as someone who was involved at times, but mostly an observer from a distance. From CVS to DVCS From its release in 1998, the Mozilla source code was kept in a CVS repository. If you're too young to know what CVS is, let's just say it's an old school version control system, with its set of problems. Back then, it was mostly ubiquitous in the Open Source world, as far as I remember. In the early 2000s, the Subversion version control system gained some traction, solving some of the problems that came with CVS. Incidentally, Subversion was created by Jim Blandy, who now works at Mozilla on completely unrelated matters. In the same period, the Linux kernel development moved from CVS to Bitkeeper, which was more suitable to the distributed nature of the Linux community. BitKeeper had its own problem, though: it was the opposite of Open Source, but for most pragmatic people, it wasn't a real concern because free access was provided. Until it became a problem: someone at OSDL developed an alternative client to BitKeeper, and licenses of BitKeeper were rescinded for OSDL members, including Linus Torvalds (they were even prohibited from purchasing one). Following this fiasco, in April 2005, two weeks from each other, both Git and Mercurial were born. The former was created by Linus Torvalds himself, while the latter was developed by Olivia Mackall, who was a Linux kernel developer back then. And because they both came out of the same community for the same needs, and the same shared experience with BitKeeper, they both were similar distributed version control systems. Interestingly enough, several other DVCSes existed: In this landscape, the major difference Git was making at the time was that it was blazing fast. Almost incredibly so, at least on Linux systems. That was less true on other platforms (especially Windows). It was a game-changer for handling large codebases in a smooth manner. Anyways, two years later, in 2007, Mozilla decided to move its source code not to Bzr, not to Git, not to Subversion (which, yes, was a contender), but to Mercurial. The decision "process" was laid down in two rather colorful blog posts. My memory is a bit fuzzy, but I don't recall that it was a particularly controversial choice. All of those DVCSes were still young, and there was no definite "winner" yet (GitHub hadn't even been founded). It made the most sense for Mozilla back then, mainly because the Git experience on Windows still wasn't there, and that mattered a lot for Mozilla, with its diverse platform support. As a contributor, I didn't think much of it, although to be fair, at the time, I was mostly consuming the source tarballs. Personal preferences Digging through my archives, I've unearthed a forgotten chapter: I did end up setting up both a Mercurial and a Git mirror of the Firefox source repository on alioth.debian.org. Alioth.debian.org was a FusionForge-based collaboration system for Debian developers, similar to SourceForge. It was the ancestor of salsa.debian.org. I used those mirrors for the Debian packaging of Firefox (cough cough Iceweasel). The Git mirror was created with hg-fast-export, and the Mercurial mirror was only a necessary step in the process. By that time, I had converted my Subversion repositories to Git, and switched off SVK. Incidentally, I started contributing to Git around that time as well. I apparently did this not too long after Mozilla switched to Mercurial. As a Linux user, I think I just wanted the speed that Mercurial was not providing. Not that Mercurial was that slow, but the difference between a couple seconds and a couple hundred milliseconds was a significant enough difference in user experience for me to prefer Git (and Firefox was not the only thing I was using version control for) Other people had also similarly created their own mirror, or with other tools. But none of them were "compatible": their commit hashes were different. Hg-git, used by the latter, was putting extra information in commit messages that would make the conversion differ, and hg-fast-export would just not be consistent with itself! My mirror is long gone, and those have not been updated in more than a decade. I did end up using Mercurial, when I got commit access to the Firefox source repository in April 2010. I still kept using Git for my Debian activities, but I now was also using Mercurial to push to the Mozilla servers. I joined Mozilla as a contractor a few months after that, and kept using Mercurial for a while, but as a, by then, long time Git user, it never really clicked for me. It turns out, the sentiment was shared by several at Mozilla. Git incursion In the early 2010s, GitHub was becoming ubiquitous, and the Git mindshare was getting large. Multiple projects at Mozilla were already entirely hosted on GitHub. As for the Firefox source code base, Mozilla back then was kind of a Wild West, and engineers being engineers, multiple people had been using Git, with their own inconvenient workflows involving a local Mercurial clone. The most popular set of scripts was moz-git-tools, to incorporate changes in a local Git repository into the local Mercurial copy, to then send to Mozilla servers. In terms of the number of people doing that, though, I don't think it was a lot of people, probably a few handfuls. On my end, I was still keeping up with Mercurial. I think at that time several engineers had their own unofficial Git mirrors on GitHub, and later on Ehsan Akhgari provided another mirror, with a twist: it also contained the full CVS history, which the canonical Mercurial repository didn't have. This was particularly interesting for engineers who needed to do some code archeology and couldn't get past the 2007 cutoff of the Mercurial repository. I think that mirror ultimately became the official-looking, but really unofficial, mozilla-central repository on GitHub. On a side note, a Mercurial repository containing the CVS history was also later set up, but that didn't lead to something officially supported on the Mercurial side. Some time around 2011~2012, I started to more seriously consider using Git for work myself, but wasn't satisfied with the workflows others had set up for themselves. I really didn't like the idea of wasting extra disk space keeping a Mercurial clone around while using a Git mirror. I wrote a Python script that would use Mercurial as a library to access a remote repository and produce a git-fast-import stream. That would allow the creation of a git repository without a local Mercurial clone. It worked quite well, but it was not able to incrementally update. Other, more complete tools existed already, some of which I mentioned above. But as time was passing and the size and depth of the Mercurial repository was growing, these tools were showing their limits and were too slow for my taste, especially for the initial clone. Boot to Git In the same time frame, Mozilla ventured in the Mobile OS sphere with Boot to Gecko, later known as Firefox OS. What does that have to do with version control? The needs of third party collaborators in the mobile space led to the creation of what is now the gecko-dev repository on GitHub. As I remember it, it was challenging to create, but once it was there, Git users could just clone it and have a working, up-to-date local copy of the Firefox source code and its history... which they could already have, but this was the first officially supported way of doing so. Coincidentally, Ehsan's unofficial mirror was having trouble (to the point of GitHub closing the repository) and was ultimately shut down in December 2013. You'll often find comments on the interwebs about how GitHub has become unreliable since the Microsoft acquisition. I can't really comment on that, but if you think GitHub is unreliable now, rest assured that it was worse in its beginning. And its sustainability as a platform also wasn't a given, being a rather new player. So on top of having this official mirror on GitHub, Mozilla also ventured in setting up its own Git server for greater control and reliability. But the canonical repository was still the Mercurial one, and while Git users now had a supported mirror to pull from, they still had to somehow interact with Mercurial repositories, most notably for the Try server. Git slowly creeping in Firefox build tooling Still in the same time frame, tooling around building Firefox was improving drastically. For obvious reasons, when version control integration was needed in the tooling, Mercurial support was always a no-brainer. The first explicit acknowledgement of a Git repository for the Firefox source code, other than the addition of the .gitignore file, was bug 774109. It added a script to install the prerequisites to build Firefox on macOS (still called OSX back then), and that would print a message inviting people to obtain a copy of the source code with either Mercurial or Git. That was a precursor to current bootstrap.py, from September 2012. Following that, as far as I can tell, the first real incursion of Git in the Firefox source tree tooling happened in bug 965120. A few days earlier, bug 952379 had added a mach clang-format command that would apply clang-format-diff to the output from hg diff. Obviously, running hg diff on a Git working tree didn't work, and bug 965120 was filed, and support for Git was added there. That was in January 2014. A year later, when the initial implementation of mach artifact was added (which ultimately led to artifact builds), Git users were an immediate thought. But while they were considered, it was not to support them, but to avoid actively breaking their workflows. Git support for mach artifact was eventually added 14 months later, in March 2016. From gecko-dev to git-cinnabar Let's step back a little here, back to the end of 2014. My user experience with Mercurial had reached a level of dissatisfaction that was enough for me to decide to take that script from a couple years prior and make it work for incremental updates. That meant finding a way to store enough information locally to be able to reconstruct whatever the incremental updates would be relying on (guess why other tools hid a local Mercurial clone under hood). I got something working rather quickly, and after talking to a few people about this side project at the Mozilla Portland All Hands and seeing their excitement, I published a git-remote-hg initial prototype on the last day of the All Hands. Within weeks, the prototype gained the ability to directly push to Mercurial repositories, and a couple months later, was renamed to git-cinnabar. At that point, as a Git user, instead of cloning the gecko-dev repository from GitHub and switching to a local Mercurial repository whenever you needed to push to a Mercurial repository (i.e. the aforementioned Try server, or, at the time, for reviews), you could just clone and push directly from/to Mercurial, all within Git. And it was fast too. You could get a full clone of mozilla-central in less than half an hour, when at the time, other similar tools would take more than 10 hours (needless to say, it's even worse now). Another couple months later (we're now at the end of April 2015), git-cinnabar became able to start off a local clone of the gecko-dev repository, rather than clone from scratch, which could be time consuming. But because git-cinnabar and the tool that was updating gecko-dev weren't producing the same commits, this setup was cumbersome and not really recommended. For instance, if you pushed something to mozilla-central with git-cinnabar from a gecko-dev clone, it would come back with a different commit hash in gecko-dev, and you'd have to deal with the divergence. Eventually, in April 2020, the scripts updating gecko-dev were switched to git-cinnabar, making the use of gecko-dev alongside git-cinnabar a more viable option. Ironically(?), the switch occurred to ease collaboration with KaiOS (you know, the mobile OS born from the ashes of Firefox OS). Well, okay, in all honesty, when the need of syncing in both directions between Git and Mercurial (we only had ever synced from Mercurial to Git) came up, I nudged Mozilla in the direction of git-cinnabar, which, in my (biased but still honest) opinion, was the more reliable option for two-way synchronization (we did have regular conversion problems with hg-git, nothing of the sort has happened since the switch). One Firefox repository to rule them all For reasons I don't know, Mozilla decided to use separate Mercurial repositories as "branches". With the switch to the rapid release process in 2011, that meant one repository for nightly (mozilla-central), one for aurora, one for beta, and one for release. And with the addition of Extended Support Releases in 2012, we now add a new ESR repository every year. Boot to Gecko also had its own branches, and so did Fennec (Firefox for Mobile, before Android). There are a lot of them. And then there are also integration branches, where developer's work lands before being merged in mozilla-central (or backed out if it breaks things), always leaving mozilla-central in a (hopefully) good state. Only one of them remains in use today, though. I can only suppose that the way Mercurial branches work was not deemed practical. It is worth noting, though, that Mercurial branches are used in some cases, to branch off a dot-release when the next major release process has already started, so it's not a matter of not knowing the feature exists or some such. In 2016, Gregory Szorc set up a new repository that would contain them all (or at least most of them), which eventually became what is now the mozilla-unified repository. This would e.g. simplify switching between branches when necessary. 7 years later, for some reason, the other "branches" still exist, but most developers are expected to be using mozilla-unified. Mozilla's CI also switched to using mozilla-unified as base repository. Honestly, I'm not sure why the separate repositories are still the main entry point for pushes, rather than going directly to mozilla-unified, but it probably comes down to switching being work, and not being a top priority. Also, it probably doesn't help that working with multiple heads in Mercurial, even (especially?) with bookmarks, can be a source of confusion. To give an example, if you aren't careful, and do a plain clone of the mozilla-unified repository, you may not end up on the latest mozilla-central changeset, but rather, e.g. one from beta, or some other branch, depending which one was last updated. Hosting is simple, right? Put your repository on a server, install hgweb or gitweb, and that's it? Maybe that works for... Mercurial itself, but that repository "only" has slightly over 50k changesets and less than 4k files. Mozilla-central has more than an order of magnitude more changesets (close to 700k) and two orders of magnitude more files (more than 700k if you count the deleted or moved files, 350k if you count the currently existing ones). And remember, there are a lot of "duplicates" of this repository. And I didn't even mention user repositories and project branches. Sure, it's a self-inflicted pain, and you'd think it could probably(?) be mitigated with shared repositories. But consider the simple case of two repositories: mozilla-central and autoland. You make autoland use mozilla-central as a shared repository. Now, you push something new to autoland, it's stored in the autoland datastore. Eventually, you merge to mozilla-central. Congratulations, it's now in both datastores, and you'd need to clean-up autoland if you wanted to avoid the duplication. Now, you'd think mozilla-unified would solve these issues, and it would... to some extent. Because that wouldn't cover user repositories and project branches briefly mentioned above, which in GitHub parlance would be considered as Forks. So you'd want a mega global datastore shared by all repositories, and repositories would need to only expose what they really contain. Does Mercurial support that? I don't think so (okay, I'll give you that: even if it doesn't, it could, but that's extra work). And since we're talking about a transition to Git, does Git support that? You may have read about how you can link to a commit from a fork and make-pretend that it comes from the main repository on GitHub? At least, it shows a warning, now. That's essentially the architectural reason why. So the actual answer is that Git doesn't support it out of the box, but GitHub has some backend magic to handle it somehow (and hopefully, other things like Gitea, Girocco, Gitlab, etc. have something similar). Now, to come back to the size of the repository. A repository is not a static file. It's a server with which you negotiate what you have against what it has that you want. Then the server bundles what you asked for based on what you said you have. Or in the opposite direction, you negotiate what you have that it doesn't, you send it, and the server incorporates what you sent it. Fortunately the latter is less frequent and requires authentication. But the former is more frequent and CPU intensive. Especially when pulling a large number of changesets, which, incidentally, cloning is. "But there is a solution for clones" you might say, which is true. That's clonebundles, which offload the CPU intensive part of cloning to a single job scheduled regularly. Guess who implemented it? Mozilla. But that only covers the cloning part. We actually had laid the ground to support offloading large incremental updates and split clones, but that never materialized. Even with all that, that still leaves you with a server that can display file contents, diffs, blames, provide zip archives of a revision, and more, all of which are CPU intensive in their own way. And these endpoints are regularly abused, and cause extra load to your servers, yes plural, because of course a single server won't handle the load for the number of users of your big repositories. And because your endpoints are abused, you have to close some of them. And I'm not mentioning the Try repository with its tens of thousands of heads, which brings its own sets of problems (and it would have even more heads if we didn't fake-merge them once in a while). Of course, all the above applies to Git (and it only gained support for something akin to clonebundles last year). So, when the Firefox OS project was stopped, there wasn't much motivation to continue supporting our own Git server, Mercurial still being the official point of entry, and git.mozilla.org was shut down in 2016. The growing difficulty of maintaining the status quo Slowly, but steadily in more recent years, as new tooling was added that needed some input from the source code manager, support for Git was more and more consistently added. But at the same time, as people left for other endeavors and weren't necessarily replaced, or more recently with layoffs, resources allocated to such tooling have been spread thin. Meanwhile, the repository growth didn't take a break, and the Try repository was becoming an increasing pain, with push times quite often exceeding 10 minutes. The ongoing work to move Try pushes to Lando will hide the problem under the rug, but the underlying problem will still exist (although the last version of Mercurial seems to have improved things). On the flip side, more and more people have been relying on Git for Firefox development, to my own surprise, as I didn't really push for that to happen. It just happened organically, by ways of git-cinnabar existing, providing a compelling experience to those who prefer Git, and, I guess, word of mouth. I was genuinely surprised when I recently heard the use of Git among moz-phab users had surpassed a third. I did, however, occasionally orient people who struggled with Mercurial and said they were more familiar with Git, towards git-cinnabar. I suspect there's a somewhat large number of people who never realized Git was a viable option. But that, on its own, can come with its own challenges: if you use git-cinnabar without being backed by gecko-dev, you'll have a hard time sharing your branches on GitHub, because you can't push to a fork of gecko-dev without pushing your entire local repository, as they have different commit histories. And switching to gecko-dev when you weren't already using it requires some extra work to rebase all your local branches from the old commit history to the new one. Clone times with git-cinnabar have also started to go a little out of hand in the past few years, but this was mitigated in a similar manner as with the Mercurial cloning problem: with static files that are refreshed regularly. Ironically, that made cloning with git-cinnabar faster than cloning with Mercurial. But generating those static files is increasingly time-consuming. As of writing, generating those for mozilla-unified takes close to 7 hours. I was predicting clone times over 10 hours "in 5 years" in a post from 4 years ago, I wasn't too far off. With exponential growth, it could still happen, although to be fair, CPUs have improved since. I will explore the performance aspect in a subsequent blog post, alongside the upcoming release of git-cinnabar 0.7.0-b1. I don't even want to check how long it now takes with hg-git or git-remote-hg (they were already taking more than a day when git-cinnabar was taking a couple hours). I suppose it's about time that I clarify that git-cinnabar has always been a side-project. It hasn't been part of my duties at Mozilla, and the extent to which Mozilla supports git-cinnabar is in the form of taskcluster workers on the community instance for both git-cinnabar CI and generating those clone bundles. Consequently, that makes the above git-cinnabar specific issues a Me problem, rather than a Mozilla problem. Taking the leap I can't talk for the people who made the proposal to move to Git, nor for the people who put a green light on it. But I can at least give my perspective. Developers have regularly asked why Mozilla was still using Mercurial, but I think it was the first time that a formal proposal was laid out. And it came from the Engineering Workflow team, responsible for issue tracking, code reviews, source control, build and more. It's easy to say "Mozilla should have chosen Git in the first place", but back in 2007, GitHub wasn't there, Bitbucket wasn't there, and all the available options were rather new (especially compared to the then 21 years-old CVS). I think Mozilla made the right choice, all things considered. Had they waited a couple years, the story might have been different. You might say that Mozilla stayed with Mercurial for so long because of the sunk cost fallacy. I don't think that's true either. But after the biggest Mercurial repository hosting service turned off Mercurial support, and the main contributor to Mercurial going their own way, it's hard to ignore that the landscape has evolved. And the problems that we regularly encounter with the Mercurial servers are not going to get any better as the repository continues to grow. As far as I know, all the Mercurial repositories bigger than Mozilla's are... not using Mercurial. Google has its own closed-source server, and Facebook has another of its own, and it's not really public either. With resources spread thin, I don't expect Mozilla to be able to continue supporting a Mercurial server indefinitely (although I guess Octobus could be contracted to give a hand, but is that sustainable?). Mozilla, being a champion of Open Source, also doesn't live in a silo. At some point, you have to meet your contributors where they are. And the Open Source world is now majoritarily using Git. I'm sure the vast majority of new hires at Mozilla in the past, say, 5 years, know Git and have had to learn Mercurial (although they arguably didn't need to). Even within Mozilla, with thousands(!) of repositories on GitHub, Firefox is now actually the exception rather than the norm. I should even actually say Desktop Firefox, because even Mobile Firefox lives on GitHub (although Fenix is moving back in together with Desktop Firefox, and the timing is such that that will probably happen before Firefox moves to Git). Heck, even Microsoft moved to Git! With a significant developer base already using Git thanks to git-cinnabar, and all the constraints and problems I mentioned previously, it actually seems natural that a transition (finally) happens. However, had git-cinnabar or something similarly viable not existed, I don't think Mozilla would be in a position to take this decision. On one hand, it probably wouldn't be in the current situation of having to support both Git and Mercurial in the tooling around Firefox, nor the resource constraints related to that. But on the other hand, it would be farther from supporting Git and being able to make the switch in order to address all the other problems. But... GitHub? I hope I made a compelling case that hosting is not as simple as it can seem, at the scale of the Firefox repository. It's also not Mozilla's main focus. Mozilla has enough on its plate with the migration of existing infrastructure that does rely on Mercurial to understandably not want to figure out the hosting part, especially with limited resources, and with the mixed experience hosting both Mercurial and git has been so far. After all, GitHub couldn't even display things like the contributors' graph on gecko-dev until recently, and hosting is literally their job! They still drop the ball on large blames (thankfully we have searchfox for those). Where does that leave us? Gitlab? For those criticizing GitHub for being proprietary, that's probably not open enough. Cloud Source Repositories? "But GitHub is Microsoft" is a complaint I've read a lot after the announcement. Do you think Google hosting would have appealed to these people? Bitbucket? I'm kind of surprised it wasn't in the list of providers that were considered, but I'm also kind of glad it wasn't (and I'll leave it at that). I think the only relatively big hosting provider that could have made the people criticizing the choice of GitHub happy is Codeberg, but I hadn't even heard of it before it was mentioned in response to Mozilla's announcement. But really, with literal thousands of Mozilla repositories already on GitHub, with literal tens of millions repositories on the platform overall, the pragmatic in me can't deny that it's an attractive option (and I can't stress enough that I wasn't remotely close to the room where the discussion about what choice to make happened). "But it's a slippery slope". I can see that being a real concern. LLVM also moved its repository to GitHub (from a (I think) self-hosted Subversion server), and ended up moving off Bugzilla and Phabricator to GitHub issues and PRs four years later. As an occasional contributor to LLVM, I hate this move. I hate the GitHub review UI with a passion. At least, right now, GitHub PRs are not a viable option for Mozilla, for their lack of support for security related PRs, and the more general shortcomings in the review UI. That doesn't mean things won't change in the future, but let's not get too far ahead of ourselves. The move to Git has just been announced, and the migration has not even begun yet. Just because Mozilla is moving the Firefox repository to GitHub doesn't mean it's locked in forever or that all the eggs are going to be thrown into one basket. If bridges need to be crossed in the future, we'll see then. So, what's next? The official announcement said we're not expecting the migration to really begin until six months from now. I'll swim against the current here, and say this: the earlier you can switch to git, the earlier you'll find out what works and what doesn't work for you, whether you already know Git or not. While there is not one unique workflow, here's what I would recommend anyone who wants to take the leap off Mercurial right now: As there is no one-size-fits-all workflow, I won't tell you how to organize yourself from there. I'll just say this: if you know the Mercurial sha1s of your previous local work, you can create branches for them with:
$ git branch <branch_name> $(git cinnabar hg2git <hg_sha1>)
At this point, you should have everything available on the Git side, and you can remove the .hg directory. Or move it into some empty directory somewhere else, just in case. But don't leave it here, it will only confuse the tooling. Artifact builds WILL be confused, though, and you'll have to ./mach configure before being able to do anything. You may also hit bug 1865299 if your working tree is older than this post. If you have any problem or question, you can ping me on #git-cinnabar or #git on Matrix. I'll put the instructions above somewhere on wiki.mozilla.org, and we can collaboratively iterate on them. Now, what the announcement didn't say is that the Git repository WILL NOT be gecko-dev, doesn't exist yet, and WON'T BE COMPATIBLE (trust me, it'll be for the better). Why did I make you do all the above, you ask? Because that won't be a problem. I'll have you covered, I promise. The upcoming release of git-cinnabar 0.7.0-b1 will have a way to smoothly switch between gecko-dev and the future repository (incidentally, that will also allow to switch from a pure git-cinnabar clone to a gecko-dev one, for the git-cinnabar users who have kept reading this far). What about git-cinnabar? With Mercurial going the way of the dodo at Mozilla, my own need for git-cinnabar will vanish. Legitimately, this begs the question whether it will still be maintained. I can't answer for sure. I don't have a crystal ball. However, the needs of the transition itself will motivate me to finish some long-standing things (like finalizing the support for pushing merges, which is currently behind an experimental flag) or implement some missing features (support for creating Mercurial branches). Git-cinnabar started as a Python script, it grew a sidekick implemented in C, which then incorporated some Rust, which then cannibalized the Python script and took its place. It is now close to 90% Rust, and 10% C (if you don't count the code from Git that is statically linked to it), and has sort of become my Rust playground (it's also, I must admit, a mess, because of its history, but it's getting better). So the day to day use with Mercurial is not my sole motivation to keep developing it. If it were, it would stay stagnant, because all the features I need are there, and the speed is not all that bad, although I know it could be better. Arguably, though, git-cinnabar has been relatively stagnant feature-wise, because all the features I need are there. So, no, I don't expect git-cinnabar to die along Mercurial use at Mozilla, but I can't really promise anything either. Final words That was a long post. But there was a lot of ground to cover. And I still skipped over a bunch of things. I hope I didn't bore you to death. If I did and you're still reading... what's wrong with you? ;) So this is the end of Mercurial at Mozilla. So long, and thanks for all the fish. But this is also the beginning of a transition that is not easy, and that will not be without hiccups, I'm sure. So fasten your seatbelts (plural), and welcome the change. To circle back to the clickbait title, did I really kill Mercurial at Mozilla? Of course not. But it's like I stumbled upon a few sparks and tossed a can of gasoline on them. I didn't start the fire, but I sure made it into a proper bonfire... and now it has turned into a wildfire. And who knows? 15 years from now, someone else might be looking back at how Mozilla picked Git at the wrong time, and that, had we waited a little longer, we would have picked some yet to come new horse. But hey, that's the tech cycle for you.

12 November 2023

Petter Reinholdtsen: New and improved sqlcipher in Debian for accessing Signal database

For a while now I wanted to have direct access to the Signal database of messages and channels of my Desktop edition of Signal. I prefer the enforced end to end encryption of Signal these days for my communication with friends and family, to increase the level of safety and privacy as well as raising the cost of the mass surveillance government and non-government entities practice these days. In August I came across a nice recipe on how to use sqlcipher to extract statistics from the Signal database explaining how to do this. Unfortunately this did not work with the version of sqlcipher in Debian. The sqlcipher package is a "fork" of the sqlite package with added support for encrypted databases. Sadly the current Debian maintainer announced more than three years ago that he did not have time to maintain sqlcipher, so it seemed unlikely to be upgraded by the maintainer. I was reluctant to take on the job myself, as I have very limited experience maintaining shared libraries in Debian. After waiting and hoping for a few months, I gave up the last week, and set out to update the package. In the process I orphaned it to make it more obvious for the next person looking at it that the package need proper maintenance. The version in Debian was around five years old, and quite a lot of changes had taken place upstream into the Debian maintenance git repository. After spending a few days importing the new upstream versions, realising that upstream did not care much for SONAME versioning as I saw library symbols being both added and removed with minor version number changes to the project, I concluded that I had to do a SONAME bump of the library package to avoid surprising the reverse dependencies. I even added a simple autopkgtest script to ensure the package work as intended. Dug deep into the hole of learning shared library maintenance, I set out a few days ago to upload the new version to Debian experimental to see what the quality assurance framework in Debian had to say about the result. The feedback told me the pacakge was not too shabby, and yesterday I uploaded the latest version to Debian unstable. It should enter testing today or tomorrow, perhaps delayed by a small library transition. Armed with a new version of sqlcipher, I can now have a look at the SQL database in ~/.config/Signal/sql/db.sqlite. First, one need to fetch the encryption key from the Signal configuration using this simple JSON extraction command:
/usr/bin/jq -r '."key"' ~/.config/Signal/config.json
Assuming the result from that command is 'secretkey', which is a hexadecimal number representing the key used to encrypt the database. Next, one can now connect to the database and inject the encryption key for access via SQL to fetch information from the database. Here is an example dumping the database structure:
% sqlcipher ~/.config/Signal/sql/db.sqlite
sqlite> PRAGMA key = "x'secretkey'";
sqlite> .schema
CREATE TABLE sqlite_stat1(tbl,idx,stat);
CREATE TABLE conversations(
      id STRING PRIMARY KEY ASC,
      json TEXT,
      active_at INTEGER,
      type STRING,
      members TEXT,
      name TEXT,
      profileName TEXT
    , profileFamilyName TEXT, profileFullName TEXT, e164 TEXT, serviceId TEXT, groupId TEXT, profileLastFetchedAt INTEGER);
CREATE TABLE identityKeys(
      id STRING PRIMARY KEY ASC,
      json TEXT
    );
CREATE TABLE items(
      id STRING PRIMARY KEY ASC,
      json TEXT
    );
CREATE TABLE sessions(
      id TEXT PRIMARY KEY,
      conversationId TEXT,
      json TEXT
    , ourServiceId STRING, serviceId STRING);
CREATE TABLE attachment_downloads(
    id STRING primary key,
    timestamp INTEGER,
    pending INTEGER,
    json TEXT
  );
CREATE TABLE sticker_packs(
    id TEXT PRIMARY KEY,
    key TEXT NOT NULL,
    author STRING,
    coverStickerId INTEGER,
    createdAt INTEGER,
    downloadAttempts INTEGER,
    installedAt INTEGER,
    lastUsed INTEGER,
    status STRING,
    stickerCount INTEGER,
    title STRING
  , attemptedStatus STRING, position INTEGER DEFAULT 0 NOT NULL, storageID STRING, storageVersion INTEGER, storageUnknownFields BLOB, storageNeedsSync
      INTEGER DEFAULT 0 NOT NULL);
CREATE TABLE stickers(
    id INTEGER NOT NULL,
    packId TEXT NOT NULL,
    emoji STRING,
    height INTEGER,
    isCoverOnly INTEGER,
    lastUsed INTEGER,
    path STRING,
    width INTEGER,
    PRIMARY KEY (id, packId),
    CONSTRAINT stickers_fk
      FOREIGN KEY (packId)
      REFERENCES sticker_packs(id)
      ON DELETE CASCADE
  );
CREATE TABLE sticker_references(
    messageId STRING,
    packId TEXT,
    CONSTRAINT sticker_references_fk
      FOREIGN KEY(packId)
      REFERENCES sticker_packs(id)
      ON DELETE CASCADE
  );
CREATE TABLE emojis(
    shortName TEXT PRIMARY KEY,
    lastUsage INTEGER
  );
CREATE TABLE messages(
        rowid INTEGER PRIMARY KEY ASC,
        id STRING UNIQUE,
        json TEXT,
        readStatus INTEGER,
        expires_at INTEGER,
        sent_at INTEGER,
        schemaVersion INTEGER,
        conversationId STRING,
        received_at INTEGER,
        source STRING,
        hasAttachments INTEGER,
        hasFileAttachments INTEGER,
        hasVisualMediaAttachments INTEGER,
        expireTimer INTEGER,
        expirationStartTimestamp INTEGER,
        type STRING,
        body TEXT,
        messageTimer INTEGER,
        messageTimerStart INTEGER,
        messageTimerExpiresAt INTEGER,
        isErased INTEGER,
        isViewOnce INTEGER,
        sourceServiceId TEXT, serverGuid STRING NULL, sourceDevice INTEGER, storyId STRING, isStory INTEGER
        GENERATED ALWAYS AS (type IS 'story'), isChangeCreatedByUs INTEGER NOT NULL DEFAULT 0, isTimerChangeFromSync INTEGER
        GENERATED ALWAYS AS (
          json_extract(json, '$.expirationTimerUpdate.fromSync') IS 1
        ), seenStatus NUMBER default 0, storyDistributionListId STRING, expiresAt INT
        GENERATED ALWAYS
        AS (ifnull(
          expirationStartTimestamp + (expireTimer * 1000),
          9007199254740991
        )), shouldAffectActivity INTEGER
        GENERATED ALWAYS AS (
          type IS NULL
          OR
          type NOT IN (
            'change-number-notification',
            'contact-removed-notification',
            'conversation-merge',
            'group-v1-migration',
            'keychange',
            'message-history-unsynced',
            'profile-change',
            'story',
            'universal-timer-notification',
            'verified-change'
          )
        ), shouldAffectPreview INTEGER
        GENERATED ALWAYS AS (
          type IS NULL
          OR
          type NOT IN (
            'change-number-notification',
            'contact-removed-notification',
            'conversation-merge',
            'group-v1-migration',
            'keychange',
            'message-history-unsynced',
            'profile-change',
            'story',
            'universal-timer-notification',
            'verified-change'
          )
        ), isUserInitiatedMessage INTEGER
        GENERATED ALWAYS AS (
          type IS NULL
          OR
          type NOT IN (
            'change-number-notification',
            'contact-removed-notification',
            'conversation-merge',
            'group-v1-migration',
            'group-v2-change',
            'keychange',
            'message-history-unsynced',
            'profile-change',
            'story',
            'universal-timer-notification',
            'verified-change'
          )
        ), mentionsMe INTEGER NOT NULL DEFAULT 0, isGroupLeaveEvent INTEGER
        GENERATED ALWAYS AS (
          type IS 'group-v2-change' AND
          json_array_length(json_extract(json, '$.groupV2Change.details')) IS 1 AND
          json_extract(json, '$.groupV2Change.details[0].type') IS 'member-remove' AND
          json_extract(json, '$.groupV2Change.from') IS NOT NULL AND
          json_extract(json, '$.groupV2Change.from') IS json_extract(json, '$.groupV2Change.details[0].aci')
        ), isGroupLeaveEventFromOther INTEGER
        GENERATED ALWAYS AS (
          isGroupLeaveEvent IS 1
          AND
          isChangeCreatedByUs IS 0
        ), callId TEXT
        GENERATED ALWAYS AS (
          json_extract(json, '$.callId')
        ));
CREATE TABLE sqlite_stat4(tbl,idx,neq,nlt,ndlt,sample);
CREATE TABLE jobs(
        id TEXT PRIMARY KEY,
        queueType TEXT STRING NOT NULL,
        timestamp INTEGER NOT NULL,
        data STRING TEXT
      );
CREATE TABLE reactions(
        conversationId STRING,
        emoji STRING,
        fromId STRING,
        messageReceivedAt INTEGER,
        targetAuthorAci STRING,
        targetTimestamp INTEGER,
        unread INTEGER
      , messageId STRING);
CREATE TABLE senderKeys(
        id TEXT PRIMARY KEY NOT NULL,
        senderId TEXT NOT NULL,
        distributionId TEXT NOT NULL,
        data BLOB NOT NULL,
        lastUpdatedDate NUMBER NOT NULL
      );
CREATE TABLE unprocessed(
        id STRING PRIMARY KEY ASC,
        timestamp INTEGER,
        version INTEGER,
        attempts INTEGER,
        envelope TEXT,
        decrypted TEXT,
        source TEXT,
        serverTimestamp INTEGER,
        sourceServiceId STRING
      , serverGuid STRING NULL, sourceDevice INTEGER, receivedAtCounter INTEGER, urgent INTEGER, story INTEGER);
CREATE TABLE sendLogPayloads(
        id INTEGER PRIMARY KEY ASC,
        timestamp INTEGER NOT NULL,
        contentHint INTEGER NOT NULL,
        proto BLOB NOT NULL
      , urgent INTEGER, hasPniSignatureMessage INTEGER DEFAULT 0 NOT NULL);
CREATE TABLE sendLogRecipients(
        payloadId INTEGER NOT NULL,
        recipientServiceId STRING NOT NULL,
        deviceId INTEGER NOT NULL,
        PRIMARY KEY (payloadId, recipientServiceId, deviceId),
        CONSTRAINT sendLogRecipientsForeignKey
          FOREIGN KEY (payloadId)
          REFERENCES sendLogPayloads(id)
          ON DELETE CASCADE
      );
CREATE TABLE sendLogMessageIds(
        payloadId INTEGER NOT NULL,
        messageId STRING NOT NULL,
        PRIMARY KEY (payloadId, messageId),
        CONSTRAINT sendLogMessageIdsForeignKey
          FOREIGN KEY (payloadId)
          REFERENCES sendLogPayloads(id)
          ON DELETE CASCADE
      );
CREATE TABLE preKeys(
        id STRING PRIMARY KEY ASC,
        json TEXT
      , ourServiceId NUMBER
        GENERATED ALWAYS AS (json_extract(json, '$.ourServiceId')));
CREATE TABLE signedPreKeys(
        id STRING PRIMARY KEY ASC,
        json TEXT
      , ourServiceId NUMBER
        GENERATED ALWAYS AS (json_extract(json, '$.ourServiceId')));
CREATE TABLE badges(
        id TEXT PRIMARY KEY,
        category TEXT NOT NULL,
        name TEXT NOT NULL,
        descriptionTemplate TEXT NOT NULL
      );
CREATE TABLE badgeImageFiles(
        badgeId TEXT REFERENCES badges(id)
          ON DELETE CASCADE
          ON UPDATE CASCADE,
        'order' INTEGER NOT NULL,
        url TEXT NOT NULL,
        localPath TEXT,
        theme TEXT NOT NULL
      );
CREATE TABLE storyReads (
        authorId STRING NOT NULL,
        conversationId STRING NOT NULL,
        storyId STRING NOT NULL,
        storyReadDate NUMBER NOT NULL,
        PRIMARY KEY (authorId, storyId)
      );
CREATE TABLE storyDistributions(
        id STRING PRIMARY KEY NOT NULL,
        name TEXT,
        senderKeyInfoJson STRING
      , deletedAtTimestamp INTEGER, allowsReplies INTEGER, isBlockList INTEGER, storageID STRING, storageVersion INTEGER, storageUnknownFields BLOB, storageNeedsSync INTEGER);
CREATE TABLE storyDistributionMembers(
        listId STRING NOT NULL REFERENCES storyDistributions(id)
          ON DELETE CASCADE
          ON UPDATE CASCADE,
        serviceId STRING NOT NULL,
        PRIMARY KEY (listId, serviceId)
      );
CREATE TABLE uninstalled_sticker_packs (
        id STRING NOT NULL PRIMARY KEY,
        uninstalledAt NUMBER NOT NULL,
        storageID STRING,
        storageVersion NUMBER,
        storageUnknownFields BLOB,
        storageNeedsSync INTEGER NOT NULL
      );
CREATE TABLE groupCallRingCancellations(
        ringId INTEGER PRIMARY KEY,
        createdAt INTEGER NOT NULL
      );
CREATE TABLE IF NOT EXISTS 'messages_fts_data'(id INTEGER PRIMARY KEY, block BLOB);
CREATE TABLE IF NOT EXISTS 'messages_fts_idx'(segid, term, pgno, PRIMARY KEY(segid, term)) WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS 'messages_fts_content'(id INTEGER PRIMARY KEY, c0);
CREATE TABLE IF NOT EXISTS 'messages_fts_docsize'(id INTEGER PRIMARY KEY, sz BLOB);
CREATE TABLE IF NOT EXISTS 'messages_fts_config'(k PRIMARY KEY, v) WITHOUT ROWID;
CREATE TABLE edited_messages(
        messageId STRING REFERENCES messages(id)
          ON DELETE CASCADE,
        sentAt INTEGER,
        readStatus INTEGER
      , conversationId STRING);
CREATE TABLE mentions (
        messageId REFERENCES messages(id) ON DELETE CASCADE,
        mentionAci STRING,
        start INTEGER,
        length INTEGER
      );
CREATE TABLE kyberPreKeys(
        id STRING PRIMARY KEY NOT NULL,
        json TEXT NOT NULL, ourServiceId NUMBER
        GENERATED ALWAYS AS (json_extract(json, '$.ourServiceId')));
CREATE TABLE callsHistory (
        callId TEXT PRIMARY KEY,
        peerId TEXT NOT NULL, -- conversation id (legacy)   uuid   groupId   roomId
        ringerId TEXT DEFAULT NULL, -- ringer uuid
        mode TEXT NOT NULL, -- enum "Direct"   "Group"
        type TEXT NOT NULL, -- enum "Audio"   "Video"   "Group"
        direction TEXT NOT NULL, -- enum "Incoming"   "Outgoing
        -- Direct: enum "Pending"   "Missed"   "Accepted"   "Deleted"
        -- Group: enum "GenericGroupCall"   "OutgoingRing"   "Ringing"   "Joined"   "Missed"   "Declined"   "Accepted"   "Deleted"
        status TEXT NOT NULL,
        timestamp INTEGER NOT NULL,
        UNIQUE (callId, peerId) ON CONFLICT FAIL
      );
[ dropped all indexes to save space in this blog post ]
CREATE TRIGGER messages_on_view_once_update AFTER UPDATE ON messages
      WHEN
        new.body IS NOT NULL AND new.isViewOnce = 1
      BEGIN
        DELETE FROM messages_fts WHERE rowid = old.rowid;
      END;
CREATE TRIGGER messages_on_insert AFTER INSERT ON messages
      WHEN new.isViewOnce IS NOT 1 AND new.storyId IS NULL
      BEGIN
        INSERT INTO messages_fts
          (rowid, body)
        VALUES
          (new.rowid, new.body);
      END;
CREATE TRIGGER messages_on_delete AFTER DELETE ON messages BEGIN
        DELETE FROM messages_fts WHERE rowid = old.rowid;
        DELETE FROM sendLogPayloads WHERE id IN (
          SELECT payloadId FROM sendLogMessageIds
          WHERE messageId = old.id
        );
        DELETE FROM reactions WHERE rowid IN (
          SELECT rowid FROM reactions
          WHERE messageId = old.id
        );
        DELETE FROM storyReads WHERE storyId = old.storyId;
      END;
CREATE VIRTUAL TABLE messages_fts USING fts5(
        body,
        tokenize = 'signal_tokenizer'
      );
CREATE TRIGGER messages_on_update AFTER UPDATE ON messages
      WHEN
        (new.body IS NULL OR old.body IS NOT new.body) AND
         new.isViewOnce IS NOT 1 AND new.storyId IS NULL
      BEGIN
        DELETE FROM messages_fts WHERE rowid = old.rowid;
        INSERT INTO messages_fts
          (rowid, body)
        VALUES
          (new.rowid, new.body);
      END;
CREATE TRIGGER messages_on_insert_insert_mentions AFTER INSERT ON messages
      BEGIN
        INSERT INTO mentions (messageId, mentionAci, start, length)
        
    SELECT messages.id, bodyRanges.value ->> 'mentionAci' as mentionAci,
      bodyRanges.value ->> 'start' as start,
      bodyRanges.value ->> 'length' as length
    FROM messages, json_each(messages.json ->> 'bodyRanges') as bodyRanges
    WHERE bodyRanges.value ->> 'mentionAci' IS NOT NULL
  
        AND messages.id = new.id;
      END;
CREATE TRIGGER messages_on_update_update_mentions AFTER UPDATE ON messages
      BEGIN
        DELETE FROM mentions WHERE messageId = new.id;
        INSERT INTO mentions (messageId, mentionAci, start, length)
        
    SELECT messages.id, bodyRanges.value ->> 'mentionAci' as mentionAci,
      bodyRanges.value ->> 'start' as start,
      bodyRanges.value ->> 'length' as length
    FROM messages, json_each(messages.json ->> 'bodyRanges') as bodyRanges
    WHERE bodyRanges.value ->> 'mentionAci' IS NOT NULL
  
        AND messages.id = new.id;
      END;
sqlite>
Finally I have the tool needed to inspect and process Signal messages that I need, without using the vendor provided client. Now on to transforming it to a more useful format. As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

11 November 2023

Reproducible Builds: Reproducible Builds in October 2023

Welcome to the October 2023 report from the Reproducible Builds project. In these reports we outline the most important things that we have been up to over the past month. As a quick recap, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries.

Reproducible Builds Summit 2023 Between October 31st and November 2nd, we held our seventh Reproducible Builds Summit in Hamburg, Germany! Our summits are a unique gathering that brings together attendees from diverse projects, united by a shared vision of advancing the Reproducible Builds effort, and this instance was no different. During this enriching event, participants had the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. A number of concrete outcomes from the summit will documented in the report for November 2023 and elsewhere. Amazingly the agenda and all notes from all sessions are already online. The Reproducible Builds team would like to thank our event sponsors who include Mullvad VPN, openSUSE, Debian, Software Freedom Conservancy, Allotropia and Aspiration Tech.

Reflections on Reflections on Trusting Trust Russ Cox posted a fascinating article on his blog prompted by the fortieth anniversary of Ken Thompson s award-winning paper, Reflections on Trusting Trust:
[ ] In March 2023, Ken gave the closing keynote [and] during the Q&A session, someone jokingly asked about the Turing award lecture, specifically can you tell us right now whether you have a backdoor into every copy of gcc and Linux still today?
Although Ken reveals (or at least claims!) that he has no such backdoor, he does admit that he has the actual code which Russ requests and subsequently dissects in great but accessible detail.

Ecosystem factors of reproducible builds Rahul Bajaj, Eduardo Fernandes, Bram Adams and Ahmed E. Hassan from the Maintenance, Construction and Intelligence of Software (MCIS) laboratory within the School of Computing, Queen s University in Ontario, Canada have published a paper on the Time to fix, causes and correlation with external ecosystem factors of unreproducible builds. The authors compare various response times within the Debian and Arch Linux distributions including, for example:
Arch Linux packages become reproducible a median of 30 days quicker when compared to Debian packages, while Debian packages remain reproducible for a median of 68 days longer once fixed.
A full PDF of their paper is available online, as are many other interesting papers on MCIS publication page.

NixOS installation image reproducible On the NixOS Discourse instance, Arnout Engelen (raboof) announced that NixOS have created an independent, bit-for-bit identical rebuilding of the nixos-minimal image that is used to install NixOS. In their post, Arnout details what exactly can be reproduced, and even includes some of the history of this endeavour:
You may remember a 2021 announcement that the minimal ISO was 100% reproducible. While back then we successfully tested that all packages that were needed to build the ISO were individually reproducible, actually rebuilding the ISO still introduced differences. This was due to some remaining problems in the hydra cache and the way the ISO was created. By the time we fixed those, regressions had popped up (notably an upstream problem in Python 3.10), and it isn t until this week that we were back to having everything reproducible and being able to validate the complete chain.
Congratulations to NixOS team for reaching this important milestone! Discussion about this announcement can be found underneath the post itself, as well as on Hacker News.

CPython source tarballs now reproducible Seth Larson published a blog post investigating the reproducibility of the CPython source tarballs. Using diffoscope, reprotest and other tools, Seth documents his work that led to a pull request to make these files reproducible which was merged by ukasz Langa.

New arm64 hardware from Codethink Long-time sponsor of the project, Codethink, have generously replaced our old Moonshot-Slides , which they have generously hosted since 2016 with new KVM-based arm64 hardware. Holger Levsen integrated these new nodes to the Reproducible Builds continuous integration framework.

Community updates On our mailing list during October 2023 there were a number of threads, including:
  • Vagrant Cascadian continued a thread about the implementation details of a snapshot archive server required for reproducing previous builds. [ ]
  • Akihiro Suda shared an update on BuildKit, a toolkit for building Docker container images. Akihiro links to a interesting talk they recently gave at DockerCon titled Reproducible builds with BuildKit for software supply-chain security.
  • Alex Zakharov started a thread discussing and proposing fixes for various tools that create ext4 filesystem images. [ ]
Elsewhere, Pol Dellaiera made a number of improvements to our website, including fixing typos and links [ ][ ], adding a NixOS Flake file [ ] and sorting our publications page by date [ ]. Vagrant Cascadian presented Reproducible Builds All The Way Down at the Open Source Firmware Conference.

Distribution work distro-info is a Debian-oriented tool that can provide information about Debian (and Ubuntu) distributions such as their codenames (eg. bookworm) and so on. This month, Benjamin Drung uploaded a new version of distro-info that added support for the SOURCE_DATE_EPOCH environment variable in order to close bug #1034422. In addition, 8 reviews of packages were added, 74 were updated and 56 were removed this month, all adding to our knowledge about identified issues. Bernhard M. Wiedemann published another monthly report about reproducibility within openSUSE.

Software development The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: In addition, Chris Lamb fixed an issue in diffoscope, where if the equivalent of file -i returns text/plain, fallback to comparing as a text file. This was originally filed as Debian bug #1053668) by Niels Thykier. [ ] This was then uploaded to Debian (and elsewhere) as version 251.

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In October, a number of changes were made by Holger Levsen:
  • Debian-related changes:
    • Refine the handling of package blacklisting, such as sending blacklisting notifications to the #debian-reproducible-changes IRC channel. [ ][ ][ ]
    • Install systemd-oomd on all Debian bookworm nodes (re. Debian bug #1052257). [ ]
    • Detect more cases of failures to delete schroots. [ ]
    • Document various bugs in bookworm which are (currently) being manually worked around. [ ]
  • Node-related changes:
    • Integrate the new arm64 machines from Codethink. [ ][ ][ ][ ][ ][ ]
    • Improve various node cleanup routines. [ ][ ][ ][ ]
    • General node maintenance. [ ][ ][ ][ ]
  • Monitoring-related changes:
    • Remove unused Munin monitoring plugins. [ ]
    • Complain less visibly about too many installed kernels. [ ]
  • Misc:
    • Enhance the firewall handling on Jenkins nodes. [ ][ ][ ][ ]
    • Install the fish shell everywhere. [ ]
In addition, Vagrant Cascadian added some packages and configuration for snapshot experiments. [ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

7 November 2023

Jonathan Dowland: gitsigns (useful neovim plugins)

gitsigns is a Neovim plugin which adds a wonderfully subtle colour annotation in the left-hand gutter to reflect changes in the buffer since the last git commit1. My long-term habit with Vim and Git is to frequently background Vim (^Z) to invoke the git command directly and then foreground Vim again (fg). Over the last few years I've been trying more and more to call vim-fugitive from within Vim instead. (I still do rebases and most merges the old-fashioned way). For the most part Gitsigns is a nice passive addition to that, but it can also do a lot of useful things that Fugitive also does. Previewing changed hunks in a little floating window, in particular when resolving an awkward merge conflict, is very handy.
An example screenshot of the Gitsigns plugin
The above picture shows it in action. I've changed two lines and added another three in the top-left buffer. Gitsigns tries to choose colours from the currently active colour scheme (in this case, tender)

  1. by default Gitsigns shows changes since the last commit, as git diff does, but you can easily switch out the base from which it compares.

5 November 2023

Thorsten Alteholz: My Debian Activities in October 2023

FTP master This month I accepted 361 and rejected 34 packages. The overall number of packages that got accepted was 362. Debian LTS This was my hundred-twelfth month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian. During my allocated time I uploaded: Unfortunately upstream still could not resolve whether the patch for CVE-2023-42118 of libspf2 is valid, so no progress happened here.
I also continued to work on bind9 and try to understand why some tests fail. Last but not least I did some days of frontdesk duties and took part in the LTS meeting. Debian ELTS This month was the sixty-third ELTS month. During my allocated time I uploaded: I also continued to work on bind9 and as with the version in LTS, I try to understand why some tests fail. Last but not least I did some days of frontdesk duties . Debian Printing This month I uploaded a new upstream version of: Within the context of preserving old printing packages, I adopted: If you know of any other package that is also needed and still maintained by the QA team, please tell me. I also uploaded new upstream version of packages or uploaded a package to fix one or the other issue: This work is generously funded by Freexian! Debian Mobcom This month I uploaded a package to fix one or the other issue: Other stuff This month I uploaded new upstream version of packages, did a source upload for the transition or uploaded it to fix one or the other issue:

3 November 2023

Scarlett Gately Moore: KDE: Big fixes for Snaps! Debian and KDE neon updates.

KDE Snap Marble
A big thank you goes to my parents this week for contributing to my survival fund. With that I was able to make a big push on fixing some outstanding issues on some of our snaps.
KDE neon: No major blowups this week. Worked out issues with kmail-account-wizard thanks to David R! This is now in the hands of upstream ( porting not complete ) Worked on more orange -> green packaging fixes. Debian: Fixed bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054713 in squashfuse Thanks for stopping by. Please help fund my efforts! <noscript><a href="https://liberapay.com/sgmoore/donate"><img alt="Donate using Liberapay" src="https://liberapay.com/assets/widgets/donate.svg" /></a></noscript> Donate https://gofund.me/b8b69e54

1 November 2023

Paul Wise: FLOSS Activities October 2023

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review
  • Debian wiki: RecentChanges for the month
  • Debian BTS usertags: changes for the month
  • Debian screenshots:

Administration
  • Debian IRC: rescue obsolete/unused #debian-wiki channel
  • Debian servers: rescue data from an old DebConf server
  • Debian wiki: approve accounts

Communication
  • Respond to queries from Debian users and contributors on the mailing lists and IRC

Sponsors The SWH, golang-ginkgo, DBD-ODBC, sqliteodbc work was sponsored. All other work was done on a volunteer basis.

31 October 2023

Iustin Pop: Raspberry PI OS: upgrading and cross-grading

One of the downsides of running Raspberry PI OS is the fact that - not having the resources of pure Debian - upgrades are not recommended, and cross-grades (migrating between armhf and arm64) is not even mentioned. Is this really true? It is, after all a Debian-based system, so it should in theory be doable. Let s try!

Upgrading The recently announced release based on Debian Bookworm here says:
We have always said that for a major version upgrade, you should re-image your SD card and start again with a clean image. In the past, we have suggested procedures for updating an existing image to the new version, but always with the caveat that we do not recommend it, and you do this at your own risk. This time, because the changes to the underlying architecture are so significant, we are not suggesting any procedure for upgrading a Bullseye image to Bookworm; any attempt to do this will almost certainly end up with a non-booting desktop and data loss. The only way to get Bookworm is either to create an SD card using Raspberry Pi Imager, or to download and flash a Bookworm image from here with your tool of choice.
Which means, it s time to actually try it turns out it s actually trivial, if you use RPIs as headless servers. I had only three issues:
  • if using an initrd, the new initrd-building scripts/hooks are looking for some binaries in /usr/bin, and not in /bin; solution: install manually the usrmerge package, and then re-run dpkg --configure -a;
  • also if using an initrd, the scripts are looking for the kernel config file in /boot/config-$(uname -r), and the raspberry pi kernel package doesn t provide this; workaround: modprobe configs && zcat /proc/config.gz > /boot/config-$(uname -r);
  • and finally, on normal RPI systems, that don t use manual configurations of interfaces in /etc/network/interface, migrating from the previous dhcpcd to NetworkManager will break network connectivity, and require you to log in locally and fix things.
I expect most people to hit only the 3rd, and almost no-one to use initrd on raspberry pi. But, overall, aside from these two issues and a couple of cosmetic ones (login.defs being rewritten from scratch and showing a baffling diff, for example), it was easy. Is it worth doing? Definitely. Had no data loss, and no non-booting system.

Cross-grading (32 bit to 64 bit userland) This one is actually painful. Internet searches go from it s possible, I think to it s definitely not worth trying . Examples: Aside from these, there are a gazillion other posts about switching the kernel to 64 bit. And that s worth doing on its own, but it s only half the way. So, armed with two different systems - a RPI4 4GB and a RPI Zero W2 - I tried to do this. And while it can be done, it takes many hours - first system was about 6 hours, second the same, and a third RPI4 probably took ~3 hours only since I knew the problematic issues. So, what are the steps? Basically:
  • install devscripts, since you will need dget
  • enable new architecture in dpkg: dpkg --add-architecture arm64
  • switch over apt sources to include the 64 bit repos, which are different than the 32 bit ones (Raspberry PI OS did a migration here; normally a single repository has all architectures, of course)
  • downgrade all custom rpi packages/libraries to the standard bookworm/bullseye version, since dpkg won t usually allow a single library package to have different versions (I think it s possible to override, but I didn t bother)
  • install libc for the arm64 arch (this takes some effort, it s actually a set of 3-4 packages)
  • once the above is done, install whiptail:amd64 and rejoice at running a 64-bit binary!
  • then painfully go through sets of packages and migrate the set to arm64:
    • sometimes this work via apt, sometimes you ll need to use dget and dpkg -i
    • make sure you download both the armhf and arm64 versions before doing dpkg -i, since you ll need to rollback some installs
  • at one point, you ll be able to switch over dpkg and apt to arm64, at which point the default architecture flips over; from here, if you ve done it at the right moment, it becomes very easy; you ll probably need an apt install --fix-broken, though, at first
  • and then, finish by replacing all packages with arm64 versions
  • and then, dpkg --remove-architecture armhf, reboot, and profit!
But it s tears and blood to get to that point

Pain point 1: RPI custom versions of packages Since the 32bit armhf architecture is a bit weird - having many variations - it turns out that raspberry pi OS has many packages that are very slightly tweaked to disable a compilation flag or work around build/test failures, or whatnot. Since we talk here about 64-bit capable processors, almost none of these are needed, but they do make life harder since the 64 bit version doesn t have those overrides. So what is needed would be to say downgrade all armhf packages to the version in debian upstream repo , but I couldn t find the right apt pinning incantation to do that. So what I did was to remove the 32bit repos, then use apt-show-versions to see which packages have versions that are no longer in any repo, then downgrade them. There s a further, minor, complication that there were about 3-4 packages with same version but different hash (!), which simply needed apt install --reinstall, I think.

Pain point 2: architecture independent packages There is one very big issue with dpkg in all this story, and the one that makes things very problematic: while you can have a library package installed multiple times for different architectures, as the files live in different paths, a non-library package can only be installed once (usually). For binary packages (arch:any), that is fine. But architecture-independent packages (arch:all) are problematic since usually they depend on a binary package, but they always depend on the default architecture version! Hrmm, and I just realise I don t have logs from this, so I m only ~80% confident. But basically:
  • vim-solarized (arch:all) depends on vim (arch:any)
  • if you replace vim armhf with vim arm64, this will break vim-solarized, until the default architecture becomes arm64
So you need to keep track of which packages apt will de-install, for later re-installation. It is possible that Multi-Arch: foreign solves this, per the debian wiki which says:
Note that even though Architecture: all and Multi-Arch: foreign may look like similar concepts, they are not. The former means that the same binary package can be installed on different architectures. Yet, after installation such packages are treated as if they were native architecture (by definition the architecture of the dpkg package) packages. Thus Architecture: all packages cannot satisfy dependencies from other architectures without being marked Multi-Arch foreign.
It also has warnings about how to properly use this. But, in general, not many packages have it, so it is a problem.

Pain point 3: remove + install vs overwrite It seems that depending on how the solver computes a solution, when migrating a package from 32 to 64 bit, it can choose either to:
  • overwrite in place the package (akin to dpkg -i)
  • remove + install later
The former is OK, the later is not. Or, actually, it might be that apt never can do this, for example (edited for brevity):
# apt install systemd:arm64 --no-install-recommends
The following packages will be REMOVED:
  systemd
The following NEW packages will be installed:
  systemd:arm64
0 upgraded, 1 newly installed, 1 to remove and 35 not upgraded.
Do you want to continue? [Y/n] y
dpkg: systemd: dependency problems, but removing anyway as you requested:
 systemd-sysv depends on systemd.
Removing systemd (247.3-7+deb11u2) ...
systemd is the active init system, please switch to another before removing systemd.
dpkg: error processing package systemd (--remove):
 installed systemd package pre-removal script subprocess returned error exit status 1
dpkg: too many errors, stopping
Errors were encountered while processing:
 systemd
Processing was halted because there were too many errors.
But at the same time, overwrite in place is all good - via dpkg -i from /var/cache/apt/archives. In this case it manifested via a prerm script, in other cases is manifests via dependencies that are no longer satisfied for packages that can t be removed, etc. etc. So you will have to resort to dpkg -i a lot.

Pain point 4: lib- packages that are not lib During the whole process, it is very tempting to just go ahead and install the corresponding arm64 package for all armhf lib package, in one go, since these can coexist. Well, this simple plan is complicated by the fact that some packages are named libfoo-bar, but are actual holding (e.g.) the bar binary for the libfoo package. Examples:
  • libmagic-mgc contains /usr/lib/file/magic.mgc, which conflicts between the 32 and 64 bit versions; of course, it s the exact same file, so this should be an arch:all package, but
  • libpam-modules-bin and liblockfile-bin actually contain binaries (per the -bin suffix)
It s possible to work around all this, but it changes a 1 minute:
# apt install $(dpkg -i   grep ^ii   awk ' print $2 ' grep :amrhf sed -e 's/:armhf/:arm64')
into a 10-20 minutes fight with packages (like most other steps).

Is it worth doing? Compared to the simple bullseye bookworm upgrade, I m not sure about this. The result? Yes, definitely, the system feels - weirdly - much more responsive, logged in over SSH. I guess the arm64 base architecture has some more efficient ops than the lowest denominator armhf , so to say (e.g. there was in the 32 bit version some rpi-custom package with string ops), and thus migrating to 64 bit makes more things faster , but this is subjective so it might be actually not true. But from the point of view of the effort? Unless you like to play with dpkg and apt, and understand how these work and break, I d rather say, migrate to ansible and automate the deployment. It s doable, sure, and by the third system, I got this nailed down pretty well, but it was a lot of time spent. The good aspect is that I did 3 migrations:
  • rpi zero w2: bullseye 32 bit to 64 bit, then bullseye to bookworm
  • rpi 4: bullseye to bookworm, then bookworm 32bit to 64 bit
  • same, again, for a more important system
And all three worked well and no data loss. But I m really glad I have this behind me, I probably wouldn t do a fourth system, even if forced And now, waiting for the RPI 5 to be available See you!

29 October 2023

Joachim Breitner: Squash your Github PRs with one click

TL;DR: Squash your PRs with one click at https://squasher.nomeata.de/. Very recently I got this response from the project maintainer at a pull request I contributed: Thanks, approved, please squash so that I can merge. It s nice that my contribution can go it, but why did the maintainer not just press the Squash and merge button , and instead adds the this unnecessary roundtrip to the process? Anyways, maintainers make the rules, so I play by them. But unlike the maintainer, who can squash-and-merge with just one click, squashing the PR s branch is surprisingly laberous: Github does not allow you to do that via the Web UI (and hence on mobile), and it seems you are expected to go to your computer and juggle with git rebase --interactive. I found this rather annoying, so I created Squasher, a simple service that will squash your branch for you. There is no configuration, just paste the PR url. It will use the PR title and body as the commit message (which is obviously the right way ), and create the commit in your name:
Squasher in actionSquasher in action
If you find this useful, or found it to be buggy, let me know. The code is at https://github.com/nomeata/squasher if you are curious about it.

22 October 2023

Ian Jackson: DigiSpark (ATTiny85) - Arduino, C, Rust, build systems

Recently I completed a small project, including an embedded microcontroller. For me, using the popular Arduino IDE, and C, was a mistake. The experience with Rust was better, but still very exciting, and not in a good way. Here follows the rant. Introduction In a recent project (I ll write about the purpose, and the hardware in another post) I chose to use a DigiSpark board. This is a small board with a USB-A tongue (but not a proper plug), and an ATTiny85 microcontroller, This chip has 8 pins and is quite small really, but it was plenty for my application. By choosing something popular, I hoped for convenient hardware, and an uncomplicated experience. Convenient hardware, I got. Arduino IDE The usual way to program these boards is via an IDE. I thought I d go with the flow and try that. I knew these were closely related to actual Arduinos and saw that the IDE package arduino was in Debian. But it turns out that the Debian package s version doesn t support the DigiSpark. (AFAICT from the list it offered me, I m not sure it supports any ATTiny85 board.) Also, disturbingly, its board manager seemed to be offering to install board support, suggesting it would download stuff from the internet and run it. That wouldn t be acceptable for my main laptop. I didn t expect to be doing much programming or debugging, and the project didn t have significant security requirements: the chip, in my circuit, has only a very narrow ability do anything to the real world, and no network connection of any kind. So I thought it would be tolerable to do the project on my low-security video laptop . That s the machine where I m prepared to say yes to installing random software off the internet. So I went to the upstream Arduino site and downloaded a tarball containing the Arduino IDE. After unpacking that in /opt it ran and produced a pointy-clicky IDE, as expected. I had already found a 3rd-party tutorial saying I needed to add a magic URL (from the DigiSpark s vendor) in the preferences. That indeed allowed it to download a whole pile of stuff. Compilers, bootloader clients, god knows what. However, my tiny test program didn t make it to the board. Half-buried in a too-small window was an error message about the board s bootloader ( Micronucleus ) being too new. The boards I had came pre-flashed with micronucleus 2.2. Which is hardly new, But even so the official Arduino IDE (or maybe the DigiSpark s board package?) still contains an old version. So now we have all the downsides of curl bash-ware, but we re lacking the it s up to date and it just works upsides. Further digging found some random forum posts which suggested simply downloading a newer micronucleus and manually stuffing it into the right place: one overwrites a specific file, in the middle the heaps of stuff that the Arduino IDE s board support downloader squirrels away in your home directory. (In my case, the home directory of the untrusted shared user on the video laptop,) So, whatever . I did that. And it worked! Having demo d my ability to run code on the board, I set about writing my program. Writing C again The programming language offered via the Arduino IDE is C. It s been a little while since I started a new thing in C. After having spent so much of the last several years writing Rust. C s primitiveness quickly started to grate, and the program couldn t easily be as DRY as I wanted (Don t Repeat Yourself, see Wilson et al, 2012, 4, p.6). But, I carried on; after all, this was going to be quite a small job. Soon enough I had a program that looked right and compiled. Before testing it in circuit, I wanted to do some QA. So I wrote a simulator harness that #included my Arduino source file, and provided imitations of the few Arduino library calls my program used. As an side advantage, I could build and run the simulation on my main machine, in my normal development environment (Emacs, make, etc.). The simulator runs confirmed the correct behaviour. (Perhaps there would have been some more faithful simulation tool, but the Arduino IDE didn t seem to offer it, and I wasn t inclined to go further down that kind of path.) So I got the video laptop out, and used the Arduino IDE to flash the program. It didn t run properly. It hung almost immediately. Some very ad-hoc debugging via led-blinking (like printf debugging, only much worse) convinced me that my problem was as follows: Arduino C has 16-bit ints. My test harness was on my 64-bit Linux machine. C was autoconverting things (when building for the micrcocontroller). The way the Arduino IDE ran the compiler didn t pass the warning options necessary to spot narrowing implicit conversions. Those warnings aren t the default in C in general because C compilers hate us all for compatibility reasons. I don t know why those warnings are not the default in the Arduino IDE, but my guess is that they didn t want to bother poor novice programmers with messages from the compiler explaining how their program is quite possibly wrong. After all, users don t like error messages so we shouldn t report errors. And novice programmers are especially fazed by error messages so it s better to just let them struggle themselves with the arcane mysteries of undefined behaviour in C? The Arduino IDE does offer a dropdown for compiler warnings . The default is None. Setting it to All didn t produce anything about my integer overflow bugs. And, the output was very hard to find anyway because the log window has a constant stream of strange messages from javax.jmdns, with hex DNS packet dumps. WTF. Other things that were vexing about the Arduino IDE: it has fairly fixed notions (which don t seem to be documented) about how your files and directories ought to be laid out, and magical machinery for finding things you put nearby its sketch (as it calls them) and sticking them in its ear, causing lossage. It has a tendency to become confused if you edit files under its feet (e.g. with git checkout). It wasn t really very suited to a workflow where principal development occurs elsewhere. And, important settings such as the project s clock speed, or even the target board, or the compiler warning settings to use weren t stored in the project directory along with the actual code. I didn t look too hard, but I presume they must be in a dotfile somewhere. This is madness. Apparently there is an Arduino CLI too. But I was already quite exasperated, and I didn t like the idea of going so far off the beaten path, when the whole point of using all this was to stay with popular tooling and share fate with others. (How do these others cope? I have no idea.) As for the integer overflow bug: I didn t seriously consider trying to figure out how to control in detail the C compiler options passed by the Arduino IDE. (Perhaps this is possible, but not really documented?) I did consider trying to run a cross-compiler myself from the command line, with appropriate warning options, but that would have involved providing (or stubbing, again) the Arduino/DigiSpark libraries (and bugs could easily lurk at that interface). Instead, I thought, if only I had written the thing in Rust . But that wasn t possible, was it? Does Rust even support this board? Rust on the DigiSpark I did a cursory web search and found a very useful blog post by Dylan Garrett. This encouraged me to think it might be a workable strategy. I looked at the instructions there. It seemed like I could run them via the privsep arrangement I use to protect myself when developing using upstream cargo packages from crates.io. I got surprisingly far surprisingly quickly. It did, rather startlingly, cause my rustup to download a random recent Nightly Rust, but I have six of those already for other Reasons. Very quickly I got the trinket LED blink example, referenced by Dylan s blog post, to compile. Manually copying the file to the video laptop allowed me to run the previously-downloaded micronucleus executable and successfully run the blink example on my board! I thought a more principled approach to the bootloader client might allow a more convenient workflow. I found the upstream Micronucleus git releases and tags, and had a look over its source code, release dates, etc. It seemed plausible, so I compiled v2.6 from source. That was a success: now I could build and install a Rust program onto my board, from the command line, on my main machine. No more pratting about with the video laptop. I had got further, more quickly, with Rust, than with the Arduino IDE, and the outcome and workflow was superior. So, basking in my success, I copied the directory containing the example into my own project, renamed it, and adjusted the path references. That didn t work. Now it didn t build. Even after I copied about .cargo/config.toml and rust-toolchain.toml it didn t build, producing a variety of exciting messages, depending what precisely I tried. I don t have detailed logs of my flailing: the instructions say to build it by cd ing to the subdirectory, and, given that what I was trying to do was to not follow those instructions, it didn t seem sensible to try to prepare a proper repro so I could file a ticket. I wasn t optimistic about investigating it more deeply myself: I have some experience of fighting cargo, and it s not usually fun. Looking at some of the build control files, things seemed quite complicated. Additionally, not all of the crates are on crates.io. I have no idea why not. So, I would need to supply local copies of them anyway. I decided to just git subtree add the avr-hal git tree. (That seemed better than the approach taken by the avr-hal project s cargo template, since that template involve a cargo dependency on a foreign git repository. Perhaps it would be possible to turn them into path dependencies, but given that I had evidence of file-location-sensitive behaviour, which I didn t feel like I wanted to spend time investigating, using that seems like it would possibly have invited more trouble. Also, I don t like package templates very much. They re a form of clone-and-hack: you end up stuck with whatever bugs or oddities exist in the version of the template which was current when you started.) Since I couldn t get things to build outside avr-hal, I edited the example, within avr-hal, to refer to my (one) program.rs file outside avr-hal, with a #[path] instruction. That s not pretty, but it worked. I also had to write a nasty shell script to work around the lack of good support in my nailing-cargo privsep tool for builds where cargo must be invoked in a deep subdirectory, and/or Cargo.lock isn t where it expects, and/or the target directory containing build products is in a weird place. It also has to filter the output from cargo to adjust the pathnames in the error messages. Otherwise, running both cd A; cargo build and cd B; cargo build from a Makefile produces confusing sets of error messages, some of which contain filenames relative to A and some relative to B, making it impossible for my Emacs to reliably find the right file. RIIR (Rewrite It In Rust) Having got my build tooling sorted out I could go back to my actual program. I translated the main program, and the simulator, from C to Rust, more or less line-by-line. I made the Rust version of the simulator produce the same output format as the C one. That let me check that the two programs had the same (simulated) behaviour. Which they did (after fixing a few glitches in the simulator log formatting). Emboldened, I flashed the Rust version of my program to the DigiSpark. It worked right away! RIIR had caused the bug to vanish. Of course, to rewrite the program in Rust, and get it to compile, it was necessary to be careful about the types of all the various integers, so that s not so surprising. Indeed, it was the point. I was then able to refactor the program to be a bit more natural and DRY, and improve some internal interfaces. Rust s greater power, compared to C, made those cleanups easier, so making them worthwhile. However, when doing real-world testing I found a weird problem: my timings were off. Measured, the real program was too fast by a factor of slightly more than 2. A bit of searching (and searching my memory) revealed the cause: I was using a board template for an Adafruit Trinket. The Trinket has a clock speed of 8MHz. But the DigiSpark runs at 16.5MHz. (This is discussed in a ticket against one of the C/C++ libraries supporting the ATTiny85 chip.) The Arduino IDE had offered me a choice of clock speeds. I have no idea how that dropdown menu took effect; I suspect it was adding prelude code to adjust the clock prescaler. But my attempts to mess with the CPU clock prescaler register by hand at the start of my Rust program didn t bear fruit. So instead, I adopted a bodge: since my code has (for code structure reasons, amongst others) only one place where it dealt with the underlying hardware s notion of time, I simply changed my delay function to adjust the passed-in delay values, compensating for the wrong clock speed. There was probably a more principled way. For example I could have (re)based my work on either of the two unmerged open MRs which added proper support for the DigiSpark board, rather than abusing the Adafruit Trinket definition. But, having a nearly-working setup, and an explanation for the behaviour, I preferred the narrower fix to reopening any cans of worms. An offer of help As will be obvious from this posting, I m not an expert in dev tools for embedded systems. Far from it. This area seems like quite a deep swamp, and I m probably not the person to help drain it. (Frankly, much of the improvement work ought to be done, and paid for, by hardware vendors.) But, as a full Member of the Debian Project, I have considerable gatekeeping authority there. I also have much experience of software packaging, build systems, and release management. If anyone wants to try to improve the situation with embedded tooling in Debian, and is willing to do the actual packaging work. I would be happy to advise, and to review and sponsor your contributions. An obvious candidate: it seems to me that micronucleus could easily be in Debian. Possibly a DigiSpark board definition could be provided to go with the arduino package. Unfortunately, IMO Debian s Rust packaging tooling and workflows are very poor, and the first of my suggestions for improvement wasn t well received. So if you need help with improving Rust packages in Debian, please talk to the Debian Rust Team yourself. Conclusions Embedded programming is still rather a mess and probably always will be. Embedded build systems can be bizarre. Documentation is scant. You re often expected to download board support packages full of mystery binaries, from the board vendor (or others). Dev tooling is maddening, especially if aimed at novice programmers. You want version control? Hermetic tracking of your project s build and install configuration? Actually to be told by the compiler when you write obvious bugs? You re way off the beaten track. As ever, Free Software is under-resourced and the maintainers are often busy, or (reasonably) have other things to do with their lives. All is not lost Rust can be a significantly better bet than C for embedded software: The Rust compiler will catch a good proportion of programming errors, and an experienced Rust programmer can arrange (by suitable internal architecture) to catch nearly all of them. When writing for a chip in the middle of some circuit, where debugging involves staring an LED or a multimeter, that s precisely what you want. Rust embedded dev tooling was, in this case, considerably better. Still quite chaotic and strange, and less mature, perhaps. But: significantly fewer mystery downloads, and significantly less crazy deviations from the language s normal build system. Overall, less bad software supply chain integrity. The ATTiny85 chip, and the DigiSpark board, served my hardware needs very well. (More about the hardware aspects of this project in a future posting.)

comment count unavailable comments

20 October 2023

Freexian Collaborators: Debian Contributions: Freexian meetup, debusine updates, lpr/lpd in Debian, and more! (by Utkarsh Gupta, Stefano Rivera)

Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

Freexian Meetup, by Stefano Rivera, Utkarsh Gupta, et al. During DebConf, Freexian organized a meetup for its collaborators and those interested in learning more about Freexian and its services. It was well received and many people interested in Freexian showed up. Some developers who were interested in contributing to LTS came to get more details about joining the project. And some prospective customers came to get to know us and ask questions. Sadly, the tragic loss of Abraham shook DebConf, both individually and structurally. The meetup got rescheduled to a small room without video coverage. With that, we still had a wholesome interaction and here s a quick picture from the meetup taken by Utkarsh (which is also why he s missing!).

Debusine, by Rapha l Hertzog, et al. Freexian has been investing into debusine for a while, but development speed is about to increase dramatically thanks to funding from SovereignTechFund.de. Rapha l laid out the 5 milestones of the funding contract, and filed the issues for the first milestone. Together with Enrico and Stefano, they established a workflow for the expanded team. Among the first steps of this milestone, Enrico started to work on a developer-friendly description of debusine that we can use when we reach out to the many Debian contributors that we will have to interact with. And Rapha l started the design work of the autopkgtest and lintian tasks, i.e. what s the interface to schedule such tasks, what behavior and what associated options do we support? At this point you might wonder what debusine is supposed to be let us try to answer this: Debusine manages scheduling and distribution of Debian-related build and QA tasks to a network of worker machines. It also manages the resulting artifacts and provides the results in an easy to consume way. We want to make it easy for Debian contributors to leverage all the great QA tools that Debian provides. We want to build the next generation of Debian s build infrastructure, one that will continue to reliably do what it already does, but that will also enable distribution-wide experiments, custom package repositories and custom workflows with advanced package reviews. If this all sounds interesting to you, don t hesitate to watch the project on salsa.debian.org and to contribute.

lpr/lpd in Debian, by Thorsten Alteholz During Debconf23, Till Kamppeter presented CPDB (Common Print Dialog Backend), a new way to handle print queues. After this talk it was discussed whether the old lpr/lpd based printing system could be abandoned in Debian or whether there is still demand for it. So Thorsten asked on the debian-devel email list whether anybody uses it. Oddly enough, these old packages are still useful:
  • Within a small network it is easier to distribute a printcap file, than to properly configure cups clients.
  • One of the biggest manufacturers of WLAN router and DSL boxes only supports raw queues when attaching an USB printer to their hardware. Admittedly the CPDB still has problems with such raw queues.
  • The Pharos printing system at MIT is still lpd-based.
As a result, the lpr/lpd stuff is not yet ready to be abandoned and Thorsten will adopt the relevant packages (or rather move them under the umbrella of the debian-printing team). Though it is not planned to develop new features, those packages should at least have a maintainer. This month Thorsten adopted rlpr, an utility for lpd printing without using /etc/printcap. The next one he is working on is lprng, a lpr/lpd printer spooling system. If you know of any other package that is also needed and still maintained by the QA team, please tell Thorsten.

/usr-merge, by Helmut Grohne Discussion about lifting the file move moratorium has been initiated with the CTTE and the release team. A formal lift is dependent on updating debootstrap in older suites though. A significant number of packages can automatically move their systemd unit files if dh_installsystemd and systemd.pc change their installation targets. Unfortunately, doing so makes some packages FTBFS and therefore patches have been filed. The analysis tool, dumat, has been enhanced to better understand which upgrade scenarios are considered supported to reduce false positive bug filings and gained a mode for local operation on a .changes file meant for inclusion in salsa-ci. The filing of bugs from dumat is still manual to improve the quality of reports. Since September, the moratorium has been lifted.

Miscellaneous contributions
  • Rapha l updated Django s backport in bullseye-backports to match the latest security release that was published in bookworm. Tracker.debian.org is still using that backport.
  • Helmut Grohne sent 13 patches for cross build failures.
  • Helmut Grohne performed a maintenance upload of debvm enabling its use in autopkgtests.
  • Helmut Grohne wrote an API-compatible reimplementation of autopkgtest-build-qemu. It is powered by mmdebstrap, therefore unprivileged, EFI-only and will soon be included in mmdebstrap.
  • Santiago continued the work regarding how to make it easier to (automatically) test reverse dependencies. An example of the ongoing work was presented during the Salsa CI BoF at DebConf 23.
    In fact, omniorb-dfsg test pipelines as the above were used for the omniorb-dfsg 4.3.0 transition, verifying how the reverse dependencies (tango, pytango and omnievents) were built and how their autopkgtest jobs run with the to-be-uploaded omniorb-dfsg new release.
  • Utkarsh and Stefano attended and helped run DebConf 23. Also continued winding up DebConf 22 accounting.
  • Anton Gladky did some science team uploads to fix RC bugs.

18 October 2023

Scarlett Gately Moore: KDE: Snap transition complete, 23.08.2 released!

KDE MascotKDE Mascot
I have completed the the Big move ! There are still a few lingering MR s, but I am sure they will be approved so I can merge soon. With the move I was also able to release 23.08.2 for most release service applications. Enjoy! You can find them all here: https://snapcraft.io/search?q=KDE I still need to raise a bit more to pay the Internet bill. If you can spare some change please consider a donation. Thank you! <script src= https://liberapay.com/sgmoore/widgets/button.js ></script> <noscript><a href= https://liberapay.com/sgmoore/donate ><img alt= Donate using Liberapay src= https://liberapay.com/assets/widgets/donate.svg ></a></noscript> Donate https://gofund.me/b8b69e54

8 October 2023

Niels Thykier: A new Debian package helper: debputy

I have made a new helper for producing Debian packages called debputy. Today, I uploaded it to Debian unstable for the first time. This enables others to migrate their package build using dh +debputy rather than the classic dh. Eventually, I hope to remove dh entirely from this equation, so you only need debputy. But for now, debputy still leverages dh support for managing upstream build systems. The debputy tool takes a radicially different approach to packaging compared to our existing packaging methods by using a single highlevel manifest instead of all the debian/install (etc.) and no hook targets in debian/rules. Here are some of the things that debputy can do or does: There are also some features that debputy does not support at the moment: There are all limitations of the current work in progress. I hope to resolve them all in due time.

Trying debputy With the limitations aside, lets talk about how you would go about migrating a package:
# Assuming here you have already run: apt install dh-debputy
$ git clone https://salsa.debian.org/rra/kstart
[...]
$ cd kstart
# Add a Build-Dependency on dh-sequence-debputy
$ perl -n -i -e \
   'print; print " dh-sequence-debputy,\n" if m/debhelper-compat/;' \
    debian/control
$ debputy migrate-from-dh --apply-changes
debputy: info: Loading plugin debputy (version: archive/debian/4.3-1) ...
debputy: info: Verifying the generating manifest
debputy: info: Updated manifest debian/debputy.manifest
debputy: info: Removals:
debputy: info:   rm -f "./debian/docs"
debputy: info:   rm -f "./debian/examples"
debputy: info: Migrations performed successfully
debputy: info: Remember to validate the resulting binary packages after rebuilding with debputy
$ cat debian/debputy.manifest 
manifest-version: '0.1'
installations:
- install-docs:
    sources:
    - NEWS
    - README
    - TODO
- install-examples:
    source: examples/krenew-agent
$ git add debian/debputy.manifest
$ git commit --signoff -am"Migrate to debputy"
# Run build tool of choice to verify the output.
This is of course a specific example that works out of the box. If you were to try this on debianutils (from git), the output would look something like this:
$ debputy migrate-from-dh
debputy: info: Loading plugin debputy (version: 5.13-13-g9836721) ...
debputy: error: Unable to migrate automatically due to missing features in debputy.
  * The "debian/triggers" debhelper config file (used by dh_installdeb is currently not supported by debputy.
Use --acceptable-migration-issues=[...] to convert this into a warning [...]
And indeed, debianutils requires at least 4 debhelper features beyond what debputy can support at the moment (all related to maintscripts and triggers).

Rapid feedback Rapid feedback cycles are important for keeping developers engaged in their work. The debputy tool provides the following features to enable rapid feedback.

Immediate manifest validation It would be absolutely horrible if you had to do a full-rebuild only to realize you got the manifest syntax wrong. Therefore, debputy has a check-manifest command that checks the manifest for syntactical and semantic issues.
$ cat debian/debputy.manifest
manifest-version: '0.1'
installations:
- install-docs:
    sources:
    - GETTING-STARTED-WITH-dh-debputy.md
    - MANIFEST-FORMAT.md
    - MIGRATING-A-DH-PLUGIN.md
$ debputy check-manifest
debputy: info: Loading plugin debputy (version: 0.1.7-1-gf34bd66) ...
debputy: info: No errors detected.
$ cat <<EOF >> debian/debputy.manifest
- install:
    sourced: foo
    as: usr/bin/foo
EOF
# Did I typo anything?
$ debputy check-manifest
debputy: info: Loading plugin debputy (version: 0.1.7-2-g4ef8c2f) ...
debputy: warning: Possible typo: The key "sourced" at "installations[1].install" should probably have been 'source'
debputy: error: Unknown keys " 'sourced' " at installations[1].install".  Keys that could be used here are: sources, when, dest-dir, source, into.
debputy: info: Loading plugin debputy (version: 0.1.7-2-g4ef8c2f) ...
$ sed -i s/sourced:/source:/ debian/debputy.manifest
$ debputy check-manifest
debputy: info: Loading plugin debputy (version: 0.1.7-2-g4ef8c2f) ...
debputy: info: No errors detected.
The debputy check-manifest command is limited to the manifest itself and does not warn about foo not existing as it could be produced as apart of the upstream build system. Therefore, there are still issues that can only be detected at package build time. But where debputy can reliably give you immediate feedback, it will do so.

Idempotence: Clean re-runs of dh_debputy without clean/rebuild If you read the fine print of many debhelper commands, you may see the following note their manpage:
This command is not idempotent. dh_prep(1) should be called between invocations of this command Manpage of an anonymous debhelper tool
What this usually means, is that if you run the command twice, you will get its maintscript change (etc.) twice in the final deb. This fits into our single-use clean throw-away chroot builds on the buildds and CI as well as dpkg-buildpackage s no-clean (-nc) option. Single-use throw-away chroots are not very helpful for debugging though, so I rarely use them when doing the majority of my packaging work as I do not want to wait for the chroot initialization (including installing of build-depends). But even then, I have found that dpkg-buildpackage -nc has been useless for me in many cases as I am stuck between two options:
  • With -nc, you often still interact with the upstream build system. As an example, debhelper will do a dh_prep followed by dh_auto_install, so now we are waiting for upstream s install target to run again. What should have taken seconds now easily take 0.5-1 minute extra per attempt.
  • If you want to by-pass this, you have to manually call the helpers needed (in correct order) and every run accumulates cruft from previous runs to the point that cruft drowns out the actual change you want to see. Also, I am rarely in the mood to play human dh, when I am debugging an issue that I failed to fix in my first, second and third try.
As you can probably tell, neither option has worked that well for me. But with dh_debputy, I have made it a goal that it will not self-taint the final output. If dh_debputy fails, you should be able to tweak the manifest and re-run dh_debputy with the same arguments.
  • No waiting for dpkg-buildpackage -nc nor anything implied by that.
  • No self-tainting of the final deb. The result you get, is the result you would have gotten if the previous dh_debputy run never happened.
  • Because dh_debputy produces the final result, I do not have to run multiple tools in the right order.
Obviously, this is currently a lot easier, because debputy is not involved in the upstream build system at all. If this feature is useful to you, please do let me know and I will try to preserve it as debputy progresses in features.

Packager provided files On a different topic, have you ever wondered what kind of files you can place into the debian directory that debhelper automatically picks up or reacts too? I do not have an answer to that beyond it is over 80 files and that as the maintainer of debhelper, I am not willing to manually maintain such a list manually. However, I do know what the answer is in debputy, because I can just ask debputy:
$ debputy plugin list packager-provided-files
+-----------------------------+---------------------------------------------[...]
  Stem                          Installed As                                [...]
+-----------------------------+---------------------------------------------[...]
  NEWS                          /usr/share/doc/ name /NEWS.Debian           [...]
  README.Debian                 /usr/share/doc/ name /README.Debian         [...]
  TODO                          /usr/share/doc/ name /TODO.Debian           [...]
  bug-control                   /usr/share/bug/ name /control               [...]
  bug-presubj                   /usr/share/bug/ name /presubj               [...]
  bug-script                    /usr/share/bug/ name /script                [...]
  changelog                     /usr/share/doc/ name /changelog.Debian      [...]
  copyright                     /usr/share/doc/ name /copyright             [...]
[...]
This will list all file types (Stem column) that debputy knows about and it accounts for any plugin that debputy can find. Note to be deterministic, debputy will not auto-load plugins that have not been explicitly requested during package builds. So this list could list files that are available but not active for your current package. Note the output is not intended to be machine readable. That may come in later version. Feel free to chime in if you have a concrete use-case.

Take it for a spin As I started this blog post with, debputy is now available in unstable. I hope you will take it for a spin on some of your simpler packages and provide feedback on it.  For documentation, please have a look at: Thanks for considering PS: My deepest respect to the fakeroot maintainers. That game of whack-a-mole is not something I would have been willing to maintain. I think fakeroot is like the Python GIL in the sense that it has been important in getting Debian to where it is today. But at the same time, I feel it is time to let go of the crutch and find a proper solution.

3 October 2023

Bastian Blank: Introducing uploads to Debian by git tag

Several years ago, several people proposed a mechanism to upload packages to Debian just by doing "git tag" and "git push". Two not too long discussions on debian-devel (first and second) did not end with an agreement how this mechanism could work.1 The central problem was the ability to properly trace uploads back to the ones who authorised it. Now, several years later and after re-reading those e-mail threads, I again was stopped at the question: can we do this? Yes, it would not be just "git tag", but could we do with close enough? I have some rudimentary code ready to actually do uploads from the CI. However the list of caveats are currently pretty long. Yes, it works in principle. It is still disabled here, because in practice it does not yet work. Problems with this setup So what are the problems? It requires the git tags to include the both signed files for a successful source upload. This is solved by a new tool that could be a git sub-command. It just creates the source package, signs it and adds the signed file describing the upload (the .dsc and .changes file) to the tag to be pushed. The CI then extracts the signed files from the tag message and does it's work as normal. It requires a sufficiently reproducible build for source packages. Right now it is only known to work with the special 3.0 (gitarchive) source format, but even that requires the latest version of this format. No idea if it is possible to use others, like 3.0 (quilt) for this purpose. The shared GitLab runner provides by Salsa do not allow ftp access to the outside. But Debian still uses ftp to do uploads. At least if you don't want to share your ssh key, which can't be restricted to uploads only, but ssh would not work either. And as the current host for those builds, the Google Cloud Platform, does not provide connection tracking support for ftp, there is no easy way to allow that without just allowing everything. So we have no way to currently actually perform uploads from this platform. Further work As this is code running in a CI under the control of the developer, we can easily do other workflows. Some teams do workflows that do tags after acceptance into the Debian archive. Or they don't use tags at all. With some other markers, like variables or branch names, this support can be expanded easily. Unrelated to this task here, we might want to think about tying the .changes files for uploads to the target archive. As this code makes all of them readily available in form of tag message, replaying them into possible other archives might be more of a concern now. Conclusion So to come back to the question, yes we can. We can prepare uploads using our CI in a way that they would be accepted into the Debian archive. It just needs some more work on infrastructure.

  1. Here I have to admit that, after reading it again, I'm not really proud of my own behaviour and have to apologise.

1 October 2023

Paul Wise: FLOSS Activities September 2023

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review
  • Spam: reported 2 Debian bug reports
  • Debian wiki: RecentChanges for the month
  • Debian BTS usertags: changes for the month
  • Debian screenshots:
    • approved fzf lame lsd termshark vifm
    • rejected orthanc (private data), gpr/orthanc (Windows), qrencode (random QR codes), weboob-qt (chess website)

Administration
  • Debian IRC: fix #debian-pkg-security topic/metadata
  • Debian wiki: unblock IP addresses, approve accounts

Communication

Sponsors The SWH work was sponsored. All other work was done on a volunteer basis.

30 September 2023

Fran ois Marier: Things I do after uploading a new package to Debian

There are a couple of things I tend to do after packaging a piece of software for Debian, filing an Intent To Package bug and uploading the package. This is both a checklist for me and (hopefully) a way to inspire other maintainers to go beyond the basic package maintainer duties as documented in the Debian Developer's Reference. If I've missed anything, please leave an comment or send me an email!

Salsa for collaborative development To foster collaboration and allow others to contribute to the packaging, I upload my package to a new subproject on Salsa. By doing this, I enable other Debian contributors to make improvements and propose changes via merge requests. I also like to upload the project logo in the settings page (i.e. https://salsa.debian.org/debian/packagename/edit) since that will show up on some dashboards like the Package overview.

Launchpad for interacting with downstream Ubuntu users While Debian is my primary focus, I also want to keep an eye on how my package is doing on derivative distributions like Ubuntu. To do this, I subscribe to bugs related to my package on Launchpad. Ubuntu bugs are rarely Ubuntu-specific and so I will often fix them in Debian. I also set myself as the answer contact on Launchpad Answers since these questions are often the sign of a Debian or a lack of documentation. I don't generally bother to fix bugs on Ubuntu directly though since I've not had much luck with packages in universe lately. I'd rather not spend much time preparing a package that's not going to end up being released to users as part of a Stable Release Update. On the other hand, I have succesfully requested simple Debian syncs when an important update was uploaded after the Debian Import Freeze.

Screenshots and tags I take screenshots of my package and upload them on https://screenshots.debian.net to help users understand what my package offers and how it looks. I believe that these screenshots end up in software "stores" type of applications. Similarly, I add tags to my package using https://debtags.debian.org. I'm not entirely sure where these tags are used, but they are visible from apt show packagename.

Monitoring Upstream Releases Staying up-to-date with upstream releases is one of the most important duties of a software packager. There are a lot of different ways that upstream software authors publicize their new releases. Here are some of the things I do to monitor these releases:
  • I have a cronjob which run uscan once a day to check for new upstream releases using the information specified in my debian/watch files:
      0 12 * * 1-5   francois  test -e /home/francois/devel/deb && HTTPS_PROXY= https_proxy= uscan --report /home/francois/devel/deb   true
    
  • I subscribe to the upstream project's releases RSS feed, if available. For example, I subscribe to the GitHub tags feed for git-secrets and Launchpad announcements for email-reminder.
  • If the upstream project maintains an announcement mailing list, I subscribe to it (e.g. rkhunter-announce or tor release announcements).
When nothing else is available, I write a cronjob that downloads the upstream changelog once a day and commits it to a local git repo:
#!/bin/bash
pushd /home/francois/devel/zlib-changelog > /dev/null
wget --quiet -O ChangeLog.txt https://zlib.net/ChangeLog.txt   exit 1
git diff
git commit -a -m "Updated changelog" > /dev/null
popd > /dev/null
This sends me a diff by email when a new release is added (and no emails otherwise).

23 September 2023

Sergio Talens-Oliag: GitLab CI/CD Tips: Using Rule Templates

This post describes how to define and use rule templates with semantic names using extends or !reference tags, how to define manual jobs using the same templates and how to use gitlab-ci inputs as macros to give names to regular expressions used by rules.

Basic rule templatesI keep my templates in a rules.yml file stored on a common repository used from different projects as I mentioned on my previous post, but they can be defined anywhere, the important thing is that the files that need them include their definition somehow. The first version of my rules.yml file was as follows:
.rules_common:
  # Common rules; we include them from others instead of forcing a workflow
  rules:
    # Disable branch pipelines while there is an open merge request from it
    - if: >-
        $CI_COMMIT_BRANCH &&
        $CI_OPEN_MERGE_REQUESTS &&
        $CI_PIPELINE_SOURCE != "merge_request_event"
      when: never
.rules_default:
  # Default rules, we need to add the when: on_success to make things work
  rules:
    - !reference [.rules_common, rules]
    - when: on_success
The main idea is that .rules_common defines a rule section to disable jobs as we can do on a workflow definition; in our case common rules only have if rules that apply to all jobs and are used to disable them. The example includes one that avoids creating duplicated jobs when we push to a branch that is the source of an open MR as explained here. To use the rules in a job we have two options, use the extends keyword (we do that when we want to use the rule as is) or declare a rules section and add a !reference to the template we want to use as described here (we do that when we want to add additional rules to disable a job before evaluating the template conditions). As an example, with the following definitions both jobs use the same rules:
job_1:
  extends:
    - .rules_default
  [...]
job_2:
  rules:
    - !reference [.rules_default, rules]
  [...]

Manual jobs and rule templatesTo make the jobs manual we have two options, create a version of the job that includes when: manual and defines if we want it to be optional or not (allow_failure: true makes the job optional, if we don t add that to the rule the job is blocking) or add the when: manual and the allow_failure value to the job (if we work at the job level the default value for allow_failure is false for when: manual, so it is optional by default, we have to add an explicit allow_failure = true it to make it blocking). The following example shows how we define blocking or optional manual jobs using rules with when conditions:
.rules_default_manual_blocking:
  # Default rules for optional manual jobs
  rules:
    - !reference [.rules_common, rules]
    - when: manual
      # allow_failure: false is implicit
.rules_default_manual_optional:
  # Default rules for optional manual jobs
  rules:
    - !reference [.rules_common, rules]
    - when: manual
      allow_failure: true
manual_blocking_job:
  extends:
    - .rules_default_manual_blocking
  [...]
manual_optional_job:
  extends:
    - .rules_default_manual_optional
  [...]
The problem here is that we have to create new versions of the same rule template to add the conditions, but we can avoid it using the keywords at the job level with the original rules to get the same effect; the following definitions create jobs equivalent to the ones defined earlier without creating additional templates:
manual_blocking_job:
  extends:
    - .rules_default
  when: manual
  allow_failure: false
  [...]
manual_optional_job:
  extends:
    - .rules_default
  when: manual
  # allow_failure: true is implicit
  [...]
As you can imagine, that is my preferred way of doing it, as it keeps the rules.yml file smaller and I see that the job is manual in its definition without problem.

Rules with allow_failure, changes, exists, needs or variablesUnluckily for us, for now there is no way to avoid creating additional templates as we did on the when: manual case when a rule is similar to an existing one but adds changes, exists, needs or variables to it. So, for now, if a rule needs to add any of those fields we have to copy the original rule and add the keyword section. Some notes, though:
  • we only need to add allow_failure if we want to change its value for a given condition, in other cases we can set the value at the job level.
  • if we are adding changes to the rule it is important to make sure that they are going to be evaluated as explained here.
  • when we add a needs value to a rule for a specific condition and it matches it replaces the job needs section; when using templates I would use two different job names with different conditions instead of adding a needs on a single job.

Defining rule templates with semantic namesI started to use rule templates to avoid repetition when defining jobs that needed the same rules and soon I noticed that giving them names with a semantic meaning they where easier to use and understand (we provide a name that tells us when we are going to execute the job, while the details of the variables names or values used on the rules are an implementation detail of the templates). We are not going to define real jobs on this post, but as an example we are going to define a set of rules that can be useful if we plan to follow a scaled trunk based development workflow when developing, that is, we are going to put the releasable code on the main branch and use short-lived branches to test and complete changes before pushing things to main. Using this approach we can define an initial set of rule templates with semantic names:
.rules_mr_to_main:
  rules:
    - !reference [.rules_common, rules]
    - if: $CI_MERGE_REQUEST_TARGET_BRANCH_NAME == 'main'
.rules_mr_or_push_to_main:
  rules:
    - !reference [.rules_common, rules]
    - if: $CI_MERGE_REQUEST_TARGET_BRANCH_NAME == 'main'
    - if: >-
        $CI_COMMIT_BRANCH == 'main'
        &&
        $CI_PIPELINE_SOURCE != 'merge_request_event'
.rules_push_to_main:
  rules:
    - !reference [.rules_common, rules]
    - if: >-
        $CI_COMMIT_BRANCH == 'main'
        &&
        $CI_PIPELINE_SOURCE != 'merge_request_event'
.rules_push_to_branch:
  rules:
    - !reference [.rules_common, rules]
    - if: >-
        $CI_COMMIT_BRANCH != 'main'
        &&
        $CI_PIPELINE_SOURCE != 'merge_request_event'
.rules_push_to_branch_or_mr_to_main:
  rules:
    - !reference [.rules_push_to_branch, rules]
    - if: >-
         $CI_MERGE_REQUEST_SOURCE_BRANCH_NAME != 'main'
         &&
         $CI_MERGE_REQUEST_TARGET_BRANCH_NAME == 'main'
.rules_release_tag:
  rules:
    - !reference [.rules_common, rules]
    - if: $CI_COMMIT_TAG =~ /^([0-9a-zA-Z_.-]+-)?v\d+.\d+.\d+$/
.rules_non_release_tag:
  rules:
    - !reference [.rules_common, rules]
    - if: $CI_COMMIT_TAG !~ /^([0-9a-zA-Z_.-]+-)?v\d+.\d+.\d+$/
With those names it is clear when a job is going to be executed and when using the templates on real jobs we can add additional restrictions and make the execution manual if needed as described earlier.

Using inputs as macrosOn the previous rules we have used a regular expression to identify the release tag format and assumed that the general branches are the ones with a name different than main; if we want to force a format for those branch names we can replace the condition != 'main' by a regex comparison (=~ if we look for matches, !~ if we want to define valid branch names removing the invalid ones). When testing the new gitlab-ci inputs my colleague Jorge noticed that if you keep their default value they basically work as macros. The variables declared as inputs can t hold YAML values, the truth is that their value is always a string that is replaced by the value assigned to them when including the file (if given) or by their default value, if defined. If you don t assign a value to an input variable when including the file that declares it its occurrences are replaced by its default value, making them work basically as macros; this is useful for us when working with strings that can t managed as variables, like the regular expressions used inside if conditions. With those two ideas we can add the following prefix to the rules.yaml defining inputs for both regular expressions and replace the rules that can use them by the ones shown here:
spec:
  inputs:
    # Regular expression for branches; the prefix matches the type of changes
    # we plan to work on inside the branch (we use conventional commit types as
    # the branch prefix)
    branch_regex:
      default: '/^(build ci chore docs feat fix perf refactor style test)\/.+$/'
    # Regular expression for tags
    release_tag_regex:
      default: '/^([0-9a-zA-Z_.-]+-)?v\d+.\d+.\d+$/'
---
[...]
.rules_push_to_changes_branch:
  rules:
    - !reference [.rules_common, rules]
    - if: >-
        $CI_COMMIT_BRANCH =~ $[[ inputs.branch_regex ]]
        &&
        $CI_PIPELINE_SOURCE != 'merge_request_event'
.rules_push_to_branch_or_mr_to_main:
  rules:
    - !reference [.rules_push_to_branch, rules]
    - if: >-
         $CI_MERGE_REQUEST_SOURCE_BRANCH_NAME =~ $[[ inputs.branch_regex ]]
         &&
         $CI_MERGE_REQUEST_TARGET_BRANCH_NAME == 'main'
.rules_release_tag:
  rules:
    - !reference [.rules_common, rules]
    - if: $CI_COMMIT_TAG =~ $[[ inputs.release_tag_regex ]]
.rules_non_release_tag:
  rules:
    - !reference [.rules_common, rules]
    - if: $CI_COMMIT_TAG !~ $[[ inputs.release_tag_regex ]]

Creating rules reusing existing onesI m going to finish this post with a comment about how I avoid defining extra rule templates in some common cases. The idea is simple, we can use !reference tags to fine tune rules when we need to add conditions to disable them simply adding conditions with when: never before referencing the template. As an example, in some projects I m using different job definitions depending on the DEPLOY_ENVIRONMENT value to make the job manual or automatic; as we just said we can define different jobs referencing the same rule adding a condition to check if the environment is the one we are interested in:
deploy_job_auto:
  rules:
    # Only deploy automatically if the environment is 'dev' by skipping this job
    # for other values of the DEPLOY_ENVIRONMENT variable
    - if: $DEPLOY_ENVIRONMENT != "dev"
      when: never
    - !reference [.rules_release_tag, rules]
  [...]
deploy_job_manually:
  rules:
    # Disable this job if the environment is 'dev'
    - if: $DEPLOY_ENVIRONMENT == "dev"
      when: never
    - !reference [.rules_release_tag, rules]
  when: manual
  # Change this to  false  to make the deployment job blocking
  allow_failure: true
  [...]
If you think about it the idea of adding negative conditions is what we do with the .rules_common template; we add conditions to disable the job before evaluating the real rules. The difference in that case is that we reference them at the beginning because we want those negative conditions on all jobs and that is also why we have a .rules_default condition with an when: on_success for the jobs that only need to respect the default workflow (we need the last condition to make sure that they are executed if the negative rules don t match).

16 September 2023

Sergio Talens-Oliag: GitLab CI/CD Tips: Using a Common CI Repository with Assets

This post describes how to handle files that are used as assets by jobs and pipelines defined on a common gitlab-ci repository when we include those definitions from a different project.

Problem descriptionWhen a .giltlab-ci.yml file includes files from a different repository its contents are expanded and the resulting code is the same as the one generated when the included files are local to the repository. In fact, even when the remote files include other files everything works right, as they are also expanded (see the description of how included files are merged for a complete explanation), allowing us to organise the common repository as we want. As an example, suppose that we have the following script on the assets/ folder of the common repository:
dumb.sh
#!/bin/sh
echo "The script arguments are: '$@'"
If we run the following job on the common repository:
job:
  script:
    - $CI_PROJECT_DIR/assets/dumb.sh ARG1 ARG2
the output will be:
The script arguments are: 'ARG1 ARG2'
But if we run the same job from a different project that includes the same job definition the output will be different:
/scripts-23-19051/step_script: eval: line 138: d./assets/dumb.sh: not found
The problem here is that we include and expand the YAML files, but if a script wants to use other files from the common repository as an asset (configuration file, shell script, template, etc.), the execution fails if the files are not available on the project that includes the remote job definition.

SolutionsWe can solve the issue using multiple approaches, I ll describe two of them:
  • Create files using scripts
  • Download files from the common repository

Create files using scriptsOne way to dodge the issue is to generate the non YAML files from scripts included on the pipelines using HERE documents. The problem with this approach is that we have to put the content of the files inside a script on a YAML file and if it uses characters that can be replaced by the shell (remember, we are using HERE documents) we have to escape them (error prone) or encode the whole file into base64 or something similar, making maintenance harder. As an example, imagine that we want to use the dumb.sh script presented on the previous section and we want to call it from the same PATH of the main project (on the examples we are using the same folder, in practice we can create a hidden folder inside the project directory or use a PATH like /tmp/assets-$CI_JOB_ID to leave things outside the project folder and make sure that there will be no collisions if two jobs are executed on the same place (i.e. when using a ssh runner). To create the file we will use hidden jobs to write our script template and reference tags to add it to the scripts when we want to use them. Here we have a snippet that creates the file with cat:
.file_scripts:
  create_dumb_sh:
    -  
      # Create dumb.sh script
      mkdir -p "$ CI_PROJECT_DIR /assets"
      cat >"$ CI_PROJECT_DIR /assets/dumb.sh" <<EOF
      #!/bin/sh
      echo "The script arguments are: '\$@'"
      EOF
      chmod +x "$ CI_PROJECT_DIR /assets/dumb.sh"
Note that to make things work we ve added 6 spaces before the script code and escaped the dollar sign. To do the same using base64 we replace the previous snippet by this:
.file_scripts:
  create_dumb_sh:
    -  
      # Create dumb.sh script
      mkdir -p "$ CI_PROJECT_DIR /assets"
      base64 -d >"$ CI_PROJECT_DIR /assets/dumb.sh" <<EOF
      IyEvYmluL3NoCmVjaG8gIlRoZSBzY3JpcHQgYXJndW1lbnRzIGFyZTogJyRAJyIK
      EOF
      chmod +x "$ CI_PROJECT_DIR /assets/dumb.sh"
Again, we have to indent the base64 version of the file using 6 spaces (all lines of the base64 output have to be indented) and to make changes we have to decode and re-code the file manually, making it harder to maintain. With either version we just need to add a !reference before using the script, if we add the call on the first lines of the before_script we can use the downloaded file in the before_script, script or after_script sections of the job without problems:
job:
  before_script:
    - !reference [.file_scripts, create_dumb_sh]
  script:
    - $ CI_PROJECT_DIR /assets/dumb.sh ARG1 ARG2
The output of a pipeline that uses this job will be the same as the one shown in the original example:
The script arguments are: 'ARG1 ARG2'

Download the files from the common repositoryAs we ve seen the previous solution works but is not ideal as it makes the files harder to read, maintain and use. An alternative approach is to keep the assets on a directory of the common repository (in our examples we will name it assets) and prepare a YAML file that declares some variables (i.e. the URL of the templates project and the PATH where we want to download the files) and defines a script fragment to download the complete folder. Once we have the YAML file we just need to include it and add a reference to the script fragment at the beginning of the before_script of the jobs that use files from the assets directory and they will be available when needed. The following file is an example of the YAML file we just mentioned:
bootstrap.yml
variables:
  CI_TMPL_API_V4_URL: "$ CI_API_V4_URL /projects/common%2Fci-templates"
  CI_TMPL_ARCHIVE_URL: "$ CI_TMPL_API_V4_URL /repository/archive"
  CI_TMPL_ASSETS_DIR: "/tmp/assets-$ CI_JOB_ID "
.scripts_common:
  bootstrap_ci_templates:
    -  
      # Downloading assets
      echo "Downloading assets"
      mkdir -p "$CI_TMPL_ASSETS_DIR"
      wget -q -O - --header="PRIVATE-TOKEN: $CI_TMPL_READ_TOKEN" \
        "$CI_TMPL_ARCHIVE_URL?path=assets&sha=$ CI_TMPL_REF:-main "  
        tar --strip-components 2 -C "$ASSETS_DIR" -xzf -
The file defines the following variables:
  • CI_TMPL_API_V4_URL: URL of the common project, in our case we are using the project ci-templates inside the common group (note that the slash between the group and the project is escaped, that is needed to reference the project by name, if we don t like that approach we can replace the url encoded path by the project id, i.e. we could use a value like $ CI_API_V4_URL /projects/31)
  • CI_TMPL_ARCHIVE_URL: Base URL to use the gitlab API to download files from a repository, we will add the arguments path and sha to select which sub path to download and from which commit, branch or tag (we will explain later why we use the CI_TMPL_REF, for now just keep in mind that if it is not defined we will download the version of the files available on the main branch when the job is executed).
  • CI_TMPL_ASSETS_DIR: Destination of the downloaded files.
And uses variables defined in other places:
  • CI_TMPL_READ_TOKEN: token that includes the read_api scope for the common project, we need it because the tokens created by the CI/CD pipelines of other projects can t be used to access the api of the common one.We define the variable on the gitlab CI/CD variables section to be able to change it if needed (i.e. if it expires)
  • CI_TMPL_REF: branch or tag of the common repo from which to get the files (we need that to make sure we are using the right version of the files, i.e. when testing we will use a branch and on production pipelines we can use fixed tags to make sure that the assets don t change between executions unless we change the reference).We will set the value on the .gitlab-ci.yml file of the remote projects and will use the same reference when including the files to make sure that everything is coherent.
This is an example YAML file that defines a pipeline with a job that uses the script from the common repository:
pipeline.yml
include:
  - /bootstrap.yaml
stages:
  - test
dumb_job:
  stage: test
  before_script:
    - !reference [.bootstrap_ci_templates, create_dumb_sh]
  script:
    - $ CI_TMPL_ASSETS_DIR /dumb.sh ARG1 ARG2
To use it from an external project we will use the following gitlab ci configuration:
gitlab-ci.yml
include:
  - project: 'common/ci-templates'
    ref: &ciTmplRef 'main'
    file: '/pipeline.yml'
variables:
  CI_TMPL_REF: *ciTmplRef
Where we use a YAML anchor to ensure that we use the same reference when including and when assigning the value to the CI_TMPL_REF variable (as far as I know we have to pass the ref value explicitly to know which reference was used when including the file, the anchor allows us to make sure that the value is always the same in both places). The reference we use is quite important for the reproducibility of the jobs, if we don t use fixed tags or commit hashes as references each time a job that downloads the files is executed we can get different versions of them. For that reason is not a bad idea to create tags on our common repo and use them as reference on the projects or branches that we want to behave as if their CI/CD configuration was local (if we point to a fixed version of the common repo the way everything is going to work is almost the same as having the pipelines directly in our repo). But while developing pipelines using branches as references is a really useful option; it allows us to re-run the jobs that we want to test and they will download the latest versions of the asset files on the branch, speeding up the testing process. However keep in mind that the trick only works with the asset files, if we change a job or a pipeline on the YAML files restarting the job is not enough to test the new version as the restart uses the same job created with the current pipeline. To try the updated jobs we have to create a new pipeline using a new action against the repository or executing the pipeline manually.

ConclusionFor now I m using the second solution and as it is working well my guess is that I ll keep using that approach unless giltab itself provides a better or simpler way of doing the same thing.

10 September 2023

Freexian Collaborators: Debian Contributions: /usr-merge updates, Salsa CI progress, DebConf23 lead-up, and more! (by Utkarsh Gupta)

Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

/usr-merge work, by Helmut Grohne, et al. Given that we now have consensus on moving forward by moving aliased files from / to /usr, we will also run into the problems that the file move moratorium was meant to prevent. The way forward is detecting them early and applying workarounds on a per-package basis. Said detection is now automated using the Debian Usr Merge Analysis Tool. As problems are reported to the bug tracking system, they are connected to the reports if properly usertagged. Bugs and patches for problem categories DEP17-P2 and DEP17-P6 have been filed. After consensus has been reached on the bootstrapping matters, debootstrap has been changed to swap the initial unpack and merging to avoid unpack errors due to pre-existing links. This is a precondition for having base-files install the aliasing symbolic links eventually. It was identified that the root filesystem used by the Debian installer is still unmerged and a change has been proposed. debhelper was changed to recognize systemd units installed to /usr. A discussion with the CTTE and release team on repealing the moratorium has been initiated.

Salsa CI work, by Santiago Ruano Rinc n August was a busy month in the Salsa CI world. Santiago reviewed and merged a bunch of MRs that have improved the project in different aspects: The aptly job got two MRs from Philip Hands. With the first one, the aptly now can export a couple of variables in a dotenv file, and with the second, it can include packages from multiple artifact directories. These MRs bring the base to improve how to test reverse dependencies with Salsa CI. Santiago is working on documenting this. As a result of the mass bug filing done in August, Salsa CI now includes a job to test how a package builds twice in a row. Thanks to the MRs of Sebastiaan Couwenberg and Johannes Schauer Marin Rodrigues. Last but not least, Santiago helped Johannes Schauer Marin Rodrigues to complete the support for arm64-only pipelines.

DebConf23 lead-up, by Stefano Rivera Stefano wears a few hats in the DebConf organization and in the lead up to the conference in mid-September, they ve all been quite busy. As one of the treasurers of DebConf 23, there has been a final budget update, and quite a few payments to coordinate from Debian s Trusted Organizations. We try to close the books from the previous conference at the next one, so a push was made to get DebConf 22 account statements out of TOs and record them in the conference ledger. As a website developer, we had a number of registration-related tasks, emailing attendees and trying to estimate numbers for food and accommodation. As a conference committee member, the job was mostly taking calls and helping the local team to make decisions on urgent issues. For example, getting conference visas issued to attendees required getting political approval from the Indian government. We only discovered the full process for this too late to clear some complex cases, so this required some hard calls on skipping some countries from the application list, allowing everyone else to get visas in time. Unfortunate, but necessary.

Miscellaneous contributions
  • Rapha l Hertzog updated gnome-shell-extension-hamster to a new upstream git snapshot that is compatible with GNOME Shell 44 that was recently uploaded to Debian unstable/testing. This extension makes it easy to start/stop tracking time with Hamster Time Tracker. Very handy for consultants like us who are billing their work per hour.
  • Rapha l also updated zim to the latest upstream release (0.74.2). This is a desktop wiki that can be very useful as a note-taking tool to build your own personal knowledge base or even to manage your personal todo lists.
  • Utkarsh reviewed and sponsored some uploads from mentors.debian.net.
  • Utkarsh helped the local team and the bursary team with some more DebConf activities and helped finalize the data.
  • Thorsten tried to update package hplip. Unfortunately upstream added some new compressed files that need to appear uncompressed in the package. Even though this sounded like an easy task, which seemed to be already implemented in the current debian/rules, the new type of files broke this implementation and made the package no longer buildable. The problem has been solved and the upload will happen soon.
  • Helmut sent 7 patches for cross build failures. Since dpkg-buildflags now defaults to issue arm64-specific compiler flags, more care is needed to distinguish between build architecture flags and host architecture flags than previously.
  • Stefano pushed the final bit of the tox 4 transition over the line in Debian, allowing dh-python and tox 4 to migrate to testing. We got caught up in a few unusual bugs in tox and the way we run it in Debian package building (which had to change with tox 4). This resulted in a couple of patches upstream.
  • Stefano visited Haifa, Israel, to see the proposed DebConf 24 venue and meet with the local team. While the venue isn t committed yet, we have high hopes for it.

8 September 2023

Reproducible Builds: Reproducible Builds in August 2023

Welcome to the August 2023 report from the Reproducible Builds project! In these reports we outline the most important things that we have been up to over the past month. As a quick recap, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries. The motivation behind the reproducible builds effort is to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. If you are interested in contributing to the project, please visit our Contribute page on our website.

Rust serialisation library moving to precompiled binaries Bleeping Computer reported that Serde, a popular Rust serialization framework, had decided to ship its serde_derive macro as a precompiled binary. As Ax Sharma writes:
The move has generated a fair amount of push back among developers who worry about its future legal and technical implications, along with a potential for supply chain attacks, should the maintainer account publishing these binaries be compromised.
After intensive discussions, use of the precompiled binary was phased out.

Reproducible builds, the first ten years On August 4th, Holger Levsen gave a talk at BornHack 2023 on the Danish island of Funen titled Reproducible Builds, the first ten years which promised to contain:
[ ] an overview about reproducible builds, the past, the presence and the future. How it started with a small [meeting] at DebConf13 (and before), how it grew from being a Debian effort to something many projects work on together, until in 2021 it was mentioned in an executive order of the president of the United States. (HTML slides)
Holger repeated the talk later in the month at Chaos Communication Camp 2023 in Zehdenick, Germany: A video of the talk is available online, as are the HTML slides.

Reproducible Builds Summit Just another reminder that our upcoming Reproducible Builds Summit is set to take place from October 31st November 2nd 2023 in Hamburg, Germany. Our summits are a unique gathering that brings together attendees from diverse projects, united by a shared vision of advancing the Reproducible Builds effort. During this enriching event, participants will have the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. If you re interested in joining us this year, please make sure to read the event page, the news item, or the invitation email that Mattia Rizzolo sent out, which have more details about the event and location. We are also still looking for sponsors to support the event, so do reach out to the organizing team if you are able to help. (Also of note that PackagingCon 2023 is taking place in Berlin just before our summit, and their schedule has just been published.)

Vagrant Cascadian on the Sustain podcast Vagrant Cascadian was interviewed on the SustainOSS podcast on reproducible builds:
Vagrant walks us through his role in the project where the aim is to ensure identical results in software builds across various machines and times, enhancing software security and creating a seamless developer experience. Discover how this mission, supported by the Software Freedom Conservancy and a broad community, is changing the face of Linux distros, Arch Linux, openSUSE, and F-Droid. They also explore the challenges of managing random elements in software, and Vagrant s vision to make reproducible builds a standard best practice that will ideally become automatic for users. Vagrant shares his work in progress and their commitment to the last mile problem.
The episode is available to listen (or download) from the Sustain podcast website. As it happens, the episode was recorded at FOSSY 2023, and the video of Vagrant s talk from this conference (Breaking the Chains of Trusting Trust is now available on Archive.org: It was also announced that Vagrant Cascadian will be presenting at the Open Source Firmware Conference in October on the topic of Reproducible Builds All The Way Down.

On our mailing list Carles Pina i Estany wrote to our mailing list during August with an interesting question concerning the practical steps to reproduce the hello-traditional package from Debian. The entire thread can be viewed from the archive page, as can Vagrant Cascadian s reply.

Website updates Rahul Bajaj updated our website to add a series of environment variations related to reproducible builds [ ], Russ Cox added the Go programming language to our projects page [ ] and Vagrant Cascadian fixed a number of broken links and typos around the website [ ][ ][ ].

Software development In diffoscope development this month, versions 247, 248 and 249 were uploaded to Debian unstable by Chris Lamb, who also added documentation for the new specialize_as method and expanding the documentation of the existing specialize as well [ ]. In addition, Fay Stegerman added specialize_as and used it to optimise .smali comparisons when decompiling Android .apk files [ ], Felix Yan and Mattia Rizzolo corrected some typos in code comments [ , ], Greg Chabala merged the RUN commands into single layer in the package s Dockerfile [ ] thus greatly reducing the final image size. Lastly, Roland Clobus updated tool descriptions to mark that the xb-tool has moved package within Debian [ ].
reprotest is our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. This month, Vagrant Cascadian updated the packaging to be compatible with Tox version 4. This was originally filed as Debian bug #1042918 and Holger Levsen uploaded this to change to Debian unstable as version 0.7.26 [ ].

Distribution work In Debian, 28 reviews of Debian packages were added, 14 were updated and 13 were removed this month adding to our knowledge about identified issues. A number of issue types were added, including Chris Lamb adding a new timestamp_in_documentation_using_sphinx_zzzeeksphinx_theme toolchain issue.
In August, F-Droid added 25 new reproducible apps and saw 2 existing apps switch to reproducible builds, making 191 apps in total that are published with Reproducible Builds and using the upstream developer s signature. [ ]
Bernhard M. Wiedemann published another monthly report about reproducibility within openSUSE.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In August, a number of changes were made by Holger Levsen:
  • Debian-related changes:
    • Disable Debian live image creation jobs until an OpenQA credential problem has been fixed. [ ]
    • Run our maintenance scripts every 3 hours instead of every 2. [ ]
    • Export data for unstable to the reproducible-tracker.json data file. [ ]
    • Stop varying the build path, we want reproducible builds. [ ]
    • Temporarily stop updating the pbuilder.tgz for Debian unstable due to #1050784. [ ][ ]
    • Correctly document that we are not variying usrmerge. [ ][ ]
    • Mark two armhf nodes (wbq0 and jtx1a) as down; investigation is needed. [ ]
  • Misc:
    • Force reconfiguration of all Jenkins jobs, due to the recent rise of zombie processes. [ ]
    • In the node health checks, also try to restart failed ntpsec, postfix and vnstat services. [ ][ ][ ]
  • System health checks:
    • Detect Debian live build failures due to missing credentials. [ ][ ]
    • Ignore specific types of known zombie processes. [ ][ ]
In addition, Vagrant Cascadian updated the scripts to use a predictable build path that is consistent with the one used on buildd.debian.org. [ ][ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

Next.

Previous.